Application No. 10/537,280 
Amendment Dated: October 6, 2010 
Response to Office Action mailed April 6, 2010 

REMARKS/ ARGUMENTS 

This amendment is filed in response to the Office Action mailed April 6, 2010 for the 
above captioned application. Reconsideration of the application as amended in view of the 
remarks herein is respectfully requested. 

Applicants request an extension of time sufficient to make this paper timely and enclose 
the appropriate fee. 

Applicants thank the Examiner for the clear statement of the status of the prior rejections. 
Anticipation Rejection 

Claims 121, 127, 127, 129, 130, 133, 136, 137 and 198 stand rejected as anticipated by 
Yoshida et al. The Examiner contends that Yoshida discloses a monoclonal antibody TrMo-2 
which falls within the scope of the claims, and transfers the burden to Applicants to test this 
antibody to show whether the characteristics are accurately reported because the USPTO lacks 
testing facilities. Applicants respectfully traverse this rejection on several grounds. 

First, Applicants submit that Yoshida et al. fails to provide an enabling disclosure of 
TRMo-2. The formation of any given monoclonal antibody is a matter of random chance. The 
specific monoclonal antibody does not appear to have been deposited with a repository of 
biological materials such as the American Type Culture Collection (ATCC) nor it is 
characterized by sequence or in some other manner that would allow a newly formed antibody to 
be identified as being the same as TRMo-2. This lack of an enabled source for this specific 
antibody makes it impossible for Applicants to test its properties or compare these properties 
directly with the antibodies they have developed. Since a disclosure must be enabling in order to 
be anticipatory, reliance on TRMo-2 as mentioned in Yoshida as evidence of anticipation is 
improper. The rejection as to all claims should therefore be withdrawn. 

With respect to claim 1 98, the Examiner does not provide any specific citation to a 
teaching of the recited level of affinity within the Yoshida reference. Yoshida does not provide a 
numeric affinity of TRMo-2 for the TSH receptor. It can be pointed out, however, that the 
numeric results that are provided are very different from those of the present invention. For 
example, the binding studies reported in Fig. 1 of Yoshida use antibody concentrations in the 
range of 1.25 to 10 ^ig/ml. In contrast, Fig. 1 of the present application shows inhibition of TSH 
binding at concentrations around 10-100 ng/ml This indicates several orders of magnitude 
greater affinity for the receptor. This can be seen from the following Table and Figure which 
summarize the data from the two references in one place. 
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Table 1 



Inhibition of 125 I-TSH binding to the TSHR (%) 

Antibody concentration(ng/mL) 

Yoshida TRMo-2 
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The same is true of the quantitative limits set in claims 126, 127, 129, 130, 133, 136 and 
137. The Examiner has incorrectly referred to the units of those claims as "internal" units, when 
in fact these are units of an international standard. NIBSC refers to the National Institute for 
Biological Standards and Controls, which is a standards agency of the United Kingdom and a 
WHO International Laboratory for Biological Standards. Standard 90/672 is a preparation 
available from NIBSC that exhibits a standard specific activity to which the specific activity of 
an sample can be compared. (See exhibit A). 

In terms of inhibition of TSH binding (Table 1 and Figure 1, above), hMAb TSHR1 is 
approximately 300x more potent than TRMo-2. hMAb TSHR1 has a specific activity of 150 
units per mg (Table 7 in the specification), thus TRMo-2 has a specific activity of about 0.5 units 
per mg in terms of inhibition of TSH binding which is well outside any of the numeric ranges 
claimed. 

A similar comparison to that made above can be made for cAMP stimulation. 


Table 2 Stimulation of cyclic AMP production 



Stimulation of cyclic AMP production (x basal) 

Antibody concentration ng/mL 

Yoshida TRMo-2 

hMAb TSHR1 IgG 

o.oi 


2.26 

0.02 



o.i J 


2.61 

0.2 
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2 



10 


9 
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9 
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10000 

2.3 
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■ x - Yoshida TRMo-2 
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hMAb TSHR1 has a specific activity of 318 units per mg (Table 8 in the specification). As 
shown in Fig 2 and table 2 above, and TRMo-2 is about 50,000x less active. Consequently 
TRMo-2 has a specific activity of about 0.006 units per mg in terms of stimulation of cyclic 
AMP production. Again, this is well outside any of the numerical limits of the claims. 

Thus, while the Yoshida antibody TRMo-2 may have worked to some de minimus extent, 
even if it could be reproduced, it comes nowhere close to meeting the numerical limitations of 
the present claims. 

Accordingly, Applicants submit that the rejections under 35 USC § 102 are in error and 
should be withdrawn. 
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Obviousness rejection 

Claims 204-209, 212 and 213 stand rejected as obvious over Yoshida, in view of UniProt 
PI 6473 disclosing the amino acid sequence of the human TSH Receptor, Zhong and Kohn as 
evidenced by WO 91/09137. The secondary references relate to the techniques for determining 
sequences of monoclonal antibodies and thus to make recombinant antibodies, and to motivation 
to make the recombinant antibody as now claimed. These techniques are useless, however 
without possession by the public of the Yoshida antibody to perform these steps on, Thus, the 
lack of enablement is fatal with respect to this assertion of obviousness. It is not possible to 
determine from anything that Yoshida disclosed or enabled the sequence of the Yoshida antibody 
TRMo-2, and the evidence of motivation in this case is more properly understood as a long felt 
need that neither Yoshida nor anyone else in the art had answered prior to the present invention. 
Furthermore, the comments above with respect to the activity levels are relevant to this rejection 
as well. There is simply no basis to imagine that a recombinant antibody based on the TRMo-2 
would have a level of activity anywhere near the levels required by the claims. 

Thus, the rejection under 35 USC § 103(a) should be withdrawn. 

Written Description 

Claims 134, 135, 200-202, 210 and 211 which recite specific sequences are rejected as 
lacking written description. The basis for this rejection is that the claims recite antibodies in 
which parts of the structure are identified as having specific sequences while other parts remain 
generic. The Examiner asserts that the specification "does not provide an adequate description of 
the structure of the antibodies such that the skilled artisan would be aware was in possession of 
the genera of claimed antibodies." Applicants respectfully disagree. 

As a first matter, it should be noted that the claims are directed to antibodies or 
fragments. Each of the sequences is the sequence of an antibody fragment, and Applicant 
clearly had possession of these fragments independent of any association with a complete 
antibody structure as reflected in the sequence listing and original claims 22- 27 of the PCT 
application as filed.. Applicants further point out that the application, starting on Page 9 and 
continuing to Page 10, states that the binding partner can have a V H domain without a V L domain, 
or vice versa. Thus, the application expressly discloses molecules that comprise just one type of 
domain, as well as both types of domain. Furthermore, the application states (starting on Page 
11, that one or more CDRs can be incorporated into a suitable framework, such as a different 
antibody, to confer the binding and stimulating properties of the recited antibody. 

It is further noted it is well known that the constant domains of human heavy chain and 
the lambda light chain constant regions are both essentially constant and also known. See Kabat 
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et al Sequences of proteins of immunological interest (US Public Health Service, Bethesda, MD) 
(1991) Vol 1. Copies of some relevant pages are attached as Ex. A. In addition, human CHI 
sequence data is available from Genbank (accession number A49444). In this case, the amino 
acid sequence of hMAb TSHR1 heavy chain amino acids 1-131 are shown (SEQ ID No 5). 
Amino acids 1-121 are the variable region (VH) domain (SEQ ID No 1) and amino acids 
122-131 are part of the constant region (CHI). Similarly, light chain amino acids 1-1 1 1 are the 
variable region domain of the light chain (VL). 

Furthermore, the purpose of the written description requirement is to ensure that the 
inventor had possession of the invention, not to require, or even encourage a restatement of all 
the background in the relevant art. In this case, persons skilled in the art are well aware of the 
basic structure of antibodies. For example, the Zhong abstract cited by the Examiner shows 
taking a light chain V L from a monoclonal antibody, sequencing it, and says it can be used with 
other unspecified elements to make a recombinant antibody. Since the Examiner concluded that 
this was sufficient to show a person skilled in the art that making recombinant antibodies from 
monoclonal antibodies was routine, the assertion that more than the important sequences must be 
disclosed by a patent application is unfounded. 

In this regard, the examiner is directed to the decision of the Federal Circuit in 
Capon v. Eshhar, 76 USPQ2d 1078 (Fed. Cir. 2005). In Capon, the Federal Circuit has 
considered an interference proceeding, in which the Patent Office Board of Appeals found that 
neither applicants' disclosure met the written description requirement Both applications related 
to chimeric genes designed to combine DNA encoding known antigen-binding domains and 
known lymphocyte-receptor protein into a unitary gene. Both applications claims such chimeric 
genes generically. The Patent Office Board of Appeals and Interferences held that there was a 
lack of written description because the applications claimed the invention in terms of function, 
instead of specific sequences or structures. The Federal Circuit vacated this holding, finding that 
the failure to explicitly present specific sequences based on known genes did not create a basis 
for a rejection for lack of written description, and stated that no per se rule exists for the 
recitation of specific sequences. 

Here, the structure of complete antibodies is well known, but it is also known that the 
various fragments (V H V L and CDR's) recited in the claims are what give rise to specificity for an 
antigen in this case TSH receptor. This is the meat that gives the binding specificity and the 
examiner has offered no reason (other than form paragraphs) why a person skilled in the art 
would not see a disclosure of the invention as claimed in the application as filed. Accordingly, 
there is no basis for the written description rejection as presented, and it should be withdrawn. 

Notwithstanding this view, Applicants enclose several complete articles from the art 
relating generally to the structure and construction of antibodies to show that this is indeed 
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known to persons skilled in the art, and request that these be made of record and taken into 
account by the Examiner and that this knowledge be fully addressed should the rejection be 
repeated. Liang et al. Journal of Immunological Methods 2001 247: 119-130 (Exhibit C) shows 
that in the same vector different human heavy chain V region genes can be assembled with the 
same CHI, CH2 and CH3 domain genes and an appropriate lambda light chain V region gene 
with the same CI gene. These recombinant antibodies can be expressed and their antigen binding 
function is preserved. A similar approach was also described in Knappik et al. Journal of 
Molecular Biology (2000) 296: 57-86 (Exhibit D). 


Obviousness-type Double Patenting 

Applicants submit that the USPTO lacks any authority to make an obviousness-type 
double patenting rejection in the absence of a controlling statute and properly promulgated rules. 
However, since the cited application is not yet acted upon and this case should now be allowed, 
this issue is moot at this time. 


For the foregoing reasons, the considered claims of this application are believed to be in 
form for allowance. Recombination of the non-elected claims, as appropriate, is requested. To 
the extent claims will not be recombined, a telephone call to the undersigned will result in 
authorization to cancel non-elected claims. 


Conclusion 


Respectfully submitted, 



Marina T. Larson Ph.D. 


PTO Reg. No. 32,038 
Attorney for Applicant 
(970)262-1800 
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THYROID STIMULATING ANTIBODY 
1ST INTERNATIONAL STANDARD 
90/672, 0.1 INTERNATIONAL UNITS/AMPOULE 
Instructions for use 


Blanche Lane 
South Mimms 
Potters Bar- 
Hertfordshire EN6 3QG 
United Kingdom 


1. INTRODUCTION 

The International Standard for thyroid stimulating antibody (TSAb) consists of a batch, 
of ampoules containing freeze-dried plasma proteins from a single human patient with 
high TSAb levels. The preparation has been evaluated in an international collaborative 
study and shown to possess both thyroid stimulating and thyroid receptor binding activity. 
At its 46th meeting in 1995 the Expert Committee on Biological Standardization of 
WHO established the preparation coded 90/672 as the International Standard for thyroid 
stimulating antibody, but noted that whilst the preparation is suitable for both bioassays 
and receptor assays of anti-thyroid receptor autoantibodies, the unitage in terms of 
90/672 may not be equivalent to earlier standards such as the MRC LATS-B standard 
65/122. 

2. AMPOULE CONTENTS 

Each ampoule contains the freeze-dried residue of 1.0ml of a solution which contained: 

0.02M phosphate buffer 
dialysed human plasma proteins 

'3. UNITAGE ' 

0.1 International Units (100 milli-International Units) per ampoule by definition. 
4. CAUTION 

4.1 THIS PREPARATION IS NOT FOR ADMINISTRATION TO HUMANS. 

4.2 A safety data sheet is included in the last page of these instructions 

4.3 The preparation contains material of human origin which has been tested and 
found negative for HB s Ag and anti-HIV. However, as with all preparations of 
human origin, this material cannot be assumed to be free from infectious agents. 
Suitable precautions should be taken in the use and disposal of the ampoule and 
its contents. 


National Institute for Biological Standards and Control 
Telephone 01707 654753 Fax 01707 646730 Telex 21911 Nibsac G 
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8. COLLABORATIVE STUDY 
8 1 Design of the study 

by the participants. 

2 To calibrate each of the two preparations 


terms 


of local standards, and to assign a unitage to each. 


3 To assess the stability of each of the two preparations by comparing them with 
^ V olI^ZZL subjected to accelerated thermal degradatron. 

8.2 Results and conclusions 

The preparation eoded 90/672 edited ^^j^^^t 
MRC research standard B for tong aetmg y ^ ^ ^ mge 

' ^ preparation was ---"A'SjSr.^ 

!n nsing the preparation, with its provisional assigned nnitage, nsers should note: 
. The ampoule ^nt does 

vly widely depending on the assay system employed. 

. The preparation is derived front a single patient, and may not be 
qualitatively suitable fo serve as a standard for all TSAb samples. 

9. PRODUCT LIABILITY 

9.1 

care - 

in its application and use. 
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9. PRODUCT LIABlbiJ. * 

*■ tr™ NTRSC is riven after the exercise of all reasonable 

. . L _x< A ,,e. Q 


responsibility of the user to ensure that he/she has the necessary technical skills to 
determine the appropriateness of the product for the proposed application. Results 
obtained from this product are likely to be dependent on the conditions of use and the 
variability of materials beyond the control of NIBSC. 

NIBSC accepts no liability whatsoever for any loss or damage arising from the use of this 
product, whether loss of profits, or indirect or consequential loss or otherwise, including, 
but not limited to,, personal injury other than as caused by the negligence of NIBSC. In 
particular, NIBSC accepts no liability whatsoever for :- 

i) results obtained from this product; and/or 

ii) non-delivery of goods or for damages in transit. 

9 3 In the event of any replacement of goods following loss or damage a customer 
accepts as a condition of receipt of a replacement product, acceptance of the fact that 
the replacement is not to be construed as an admission of liability an NIBSC s behalf. 

10. ACKNOWLEDGEMENTS 

Acknowledgements are due to Professor Donald Munro for providing the plasma sample. 

11. CITATION 

To avoid scientific confusion in publications or data sheets for assay kits in which this 
Standard has been used for primary calibration, the Standard should be cited by its 
correct title and ampoule code. The correct address of this Institute which distributes 
the Standard should also.be given. 
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WHO Expert Committee on Biological Standardization (1989). Guidelines for the 
preparation, characterization and establishment of international and other standards and 
reference reagents for biological substances. WHO Technical Report Series 800, 181-213 
(1990). 
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INTERNATIONAL STANDARD FOR THYROID STIMULATING ANTIBODY 
NIBSC Code: 90/672 
Instructions for Use (Version November 20, 1995 Initial Version) 

MATERIAL SAFETY SHEET 


Physical Properties (at room temperature) 

Physical appearance Solid 
Fire hazard None 


Chemical Properties 

Stable . Yes 

Corrosive 

No 

Hygroscopic Yes 

Oxidising 

No 

Flammable No 

Irritant 

No 

Other (specify) 

Contains material of human origin 


Handling: 

See precautions in section 4.3 



Toxicological Properties 

Effects of inhalation 

No adverse effects have been reported for this material 

Effects of ingestion 

No adverse effects have been reported for this material 

Effects of skin absorption 
material 

No adverse effects have been reported for this 


Suggested First Aid 

Inhalation 

Seek medical advice 

Ingestion 

Seek medical advice 

Contact with eyes 

Wash with copious amounts of water. Seek medical advice. 

Contact with skin 

Wash thoroughly with water 


Action on Spillage and Method of Disposal 

Spillages of vial contents should be taken up with absorbent material wetted with a 
viricidal agent. Rinse area with a viricidal agent followed by water. 

Absorbent material used to treat spillages should be treated as biologically hazardous 
waste. 



Compiled by: 

Date: „ . y 
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From: Kabat et al. Sequences of proteins of immunological interest (US Public Health Service, Bethesda, 
MD) (1991) Vol 1 

Page 718 Heavy chain description of sequences shown on page 662. 

Page 662 Sequences 28-40 are human IgG 1 CH-1 domains. 

Page 658 Light chain description of sequences shown on pages 653 and 654. 

Pages 653 & 654 Sequences 1-28 are human lambda constant chains. 
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Abstract 

For the expression of human, intact IgG antibodies, we have constructed a set of baculovirus expression vectors designed 
to facilitate rapid insertion of heavy and light chain genes of Fab or scFv antibodies derived from phage display antibody 
libraries. By linking them to human constant or Fc regions, expression of complete human immunoglobulin molecules was 
achieved in insect cells by infection with recombinant baculovirus. The IgG expression cassette vectors are based on the 
backbone vector which contains two back to back polyhedron and plO promoters. The IgG expression cassette elements, 
including the authentic IgG lambda or kappa and heavy chain signal sequences, as well as light chain (lambda or kappa) and 
heavy chain constant region genes are combined in a single vector and are controlled by the pi 0 and polyhedron promoter 
respectively. Either of VL or Fab-L and VH or Fab-Fd genes from common phage display systems can be directly inserted 
into one of the cassette vectors through in-frame cloning sites. This design of a single cassette vector combining heavy and 
light chain expression elements allowed rapid production and secretion of correctly processed and assembled intact 
immunoglobulins from recombinant baculovirus infected insect cells. The recombinant antibodies showed the expected 
molecular size of the H2L2 heterodimer in non reducing SDS-PAGE. No apparent differences were found between the 
expression level of heavy and light chains, and antigen binding function was preserved. For various antibodies, yields 
between 6 and 18 mg/1 IgG were obtained. © 2001 Elsevier Science B.V. All rights reserved. 

Keywords: Cassette vector; Complete IgG antibody; Baculovirus expression; Phage display; Antibody engineering 


1. Introduction 

Within the past decade, antibody phage display 
Corresponding author. Tel, +86-10-635-81325; fax: +86-10- technology has been established as a proven technol- 
635-32053. °SY to select scFv or Fab antibody fragments specific 

E-mail address: liangmf@public3.bta.net.cn (M. Liang). for various antigens. The phage display technology 

0022-1759/01/$ - see front matter © 2001 Elsevier Science B.V. All rights reserved. 
PII: S0022-1759(00)00322-7 
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has particularity boosted the generation of human 
antibody fragments. Human antibodies do not induce 
an immune response against the antibody when 
applied to the human body. Therefore, human IgG 
molecules are of particular value for therapy and in 
vivo diagnostics, but cannot be easily obtained by 
non-recombinant methods. E. coli cells, despite 
being optimal for the recombinant antibody phage 
selection procedure, are on the other hand not 
suitable to produce complete functional IgG mole- 
cules, since they cannot provide appropriate folding, 
disulfide bond formation and several post-translation- 
al modifications. The phage selection process yields 
the antigen binding regions only, either in form of 
scFv or Fab antibody fragments, which have to be 
converted to the complete diheterotrameric IgG 
molecule. Since the phage display selections usually 
result in a number of candidate antibody fragments, 
and functional tests frequently require complete IgG, 
a eukaryotic expression system is required for fast 
and convenient production. 

The baculovirus expression system has been ap- 
plied widely for the expression and study of genes 
and their proteins from various sources. Furthermore, 
it has already been established as a reliable system 
for the production of complete chimeric, humanized 
or human IgG that are similar to native molecules 
both structurally and functionally (Nesbit et al., 
1992; Poul et al., 1995a,b; Liang et al., 1997a; Porter 
et al., 1997; Tan and Lam, 1999). In contrast to the 
quite time consuming establishment of a stable 
mammalian IgG expression cells lines, the baculo- 
virus expression system offers obvious advantages 
by saving time and allowing rapid scale up of 
production. 

Cassette transfer vectors for the expression and 
secretion of intact human antibodies have been 
reported previously (Poul et al, 1995a,b). However, 
these systems require a combination of two vectors 
which separately served for VH and VL cloning, 
requiring careful and time consuming adjustment of 
titers of the two respective recombinant baculo- 
viruses. We present here a set of single, universal 
baculovirus human or humanized antibody expres- 
sion vectors with authentic IgG light and heavy 
chains signal sequences, which were specifically 
designed for direct cloning of the heavy and light 
genes of Fab or scFv antibodies selected from phage 


display libraries. By linking them to a human 
constant region, a complete IgG is expressed and 
secreted by insect cells after infection with a single 
recombinant baculovirus clone. The functionality of 
these vectors has been verified for various VH/VL or 
Fab genes obtained from phage display libraries. 


2. Materials and methods 

2.1. Materials 

Cloning vectors PCR™II andpUC18, pGEM5zf 
were purchased from Invitrogen (Groningen, Nether- 
lands) or Pomega (Mannheim, Germany). Backbone 
vector pACUW51, Sf9 cells and H5 cells were 
purchased from Pharmingen (Heidelberg, Germany). 
E. coli DH5r used for cloning was purchased from 
(GibcoBRL, Karlsruhe, Germany). FITC or HRP 
conjugated anti-human Fab, human Kappa, lambda 
and Fc antibodies were purchased from Sigma 
(Munchen, Germany). The hantaan virus vero-E6 
antigen slides were provided by Progen (Heidelberg, 
Germany). The human VH, VL or Fab genes were 
isolated in our lab or provided by the Institute of 
Virology, Chinese Academy of Preventive Medicine, 
Beijing, China. Human standard IgG for standardiza- 
tion of the ELISA was purchased from Sigma 
(Munchen, Germany). 

2.2. PCR amplification of the genetic elements for 
the IgG expression vector cassette 

All standard cloning procedures were carried out 
as described by Sambrook et al. (1989). Oligo- 
nucleotide primers used for the PCR amplification of 
heavy and light chain genes of IgG expression 
cassette are listed in Table 1. Total cellular RNA was 
prepared from pelleted human lymphocytes using 
Trizol reagent (Gibco/BRL, USA). cDNA was syn- 
thesized using oligdT primers and reverse transcrip- 
tase (Gibco/BRL, USA). The full length IgGl heavy 
chain gene was amplified with primers IgG NSVH3 
and IgG CH3 (Table 1). Thirty-five PCR cycles were 
performed, with incubations for 1 min at 94°C, 1 min 
at 54°C and 3 min at 72°C each. The 69 base pairs 
olignucleotide of kappa and lambda leader sequence 
DNAs containing mutated cloning sites SacI and 
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PCRprimers used fo rthe 

■ 

construction of the human IgG expression cassette vectors 


Name 

Oligonucleotide sequences 

Cloning sites 

NSVH3F 

5 -CGCGGA7GCACCATGGAGTTTGGGCTGAGC-3 

BamHl 


5'-CGCGGA7GCCTCCTGCGTGTAGTGGTTGTGC-3' 

BamHl 


5'-GGAAGATCrCACCATGGAAACCCCAGCGCA-3' 

Bg J I J 



g xi °° ' 30 



S 


K' rrAAr rr r rrrrrr rr irrrrr Ar \r Lr-rrrr v 

g ' 00 ' 30 

VK-Hind-F 

5 -ACAC7TGAGC7GAGGGGACCAAGC7TGAGATC-3 

Sad, Hindlll 

CK-EcoR-R 

5'-CCGGA7ArCTAGAACTAACACTCTCCCCTGTTGA-3' 


VK-Hind-F 

5'-TGGGTTGAGCTCGGAGGGACCAAGC7TACCGTC-3' 

Sad, Hindlll 

CK-EcoR-R 

5'-CCGGA7ArCTAGAACTATGAACATTCTGTAGG-3' 

EcoRV 

VH3-NS-R 

5'-CCCAGACTCGAGCAGTTGCACCTC-3' 

Xhol 

CH1-XN-F 

5 '-CCGCTCGA GCGTCTCCTCAGCTA GCACC A AGGGCCCATC-3 ' 

Xhol, Nhel 

CH2-XS-F 

5'-CCGC7GGAGCGGTGACAAAACrAGTACATGCCCACCGTGCC-3' 

XhoI.Spel 

pAc-Bam-F 

5 ' -CCTATAAATACGGA FCCGGTTAT-3 ' 

BamHl 

HindMuF 

5'-CGTAAACACGTTAAATAGAGC7TGGACA-3' 

Hindlll-Mutant 

HindMuR 

5 ' -TGTCCA A GCrCTATTTA ACGTGTTTACG-3 ' 

Hindlll-Mutant 

PAc-Sca-R 

5 '-TGACTGGTGAGTACTCAACCAAGT-3' 

Sacl 


EcoRI were amplified with primers VKNSF and 
VKNSR or VLNSF and VLNSR, respectively, by 30 
PCR cycles at 94°C min for 1 min, 54°C for 1 min 
and 72°C for 1 min. With Primers VK-Hind-F and 
CK-EcoR-R or VL-Hind-F and CL-EcoR-R, the 
constant region genes of kappa or lambda chains 
containing the VL cloning sites Sacl and Hindlll 
were obtained by a similar PCR amplification of the 
kappa or lambda genes of two human derived Fab 
antibodies (Liang et al, 1997b). All of the amplified 
PCR fragments were purified by using Qiagen Gel- 
extraction kit (Qiagen, Hilden, Germany) and stored 
at -20° for subsequent cloning. 

2.3. Construction of recombinant baculovirus IgG 
expression cassette vectors 

Four recombinant baculovirus IgG expression 
vectors (pAc-K-CH3, pAc-L-CH3,pAc-K-Fc and 
pAc-L-Fc) with different combinations of the essen- 
tial genetic elements for heavy and light chain 
expression have been constructed as follows The 
PCR product of the full length human heavy chain 
DNA including the authentic IgGl heavy chain 
leader DNA were cloned into the BamHl site of 
plasmid pUC18. To remove the gene fragments 
encoding for the antigen binding parts, the fragments 
containing the modified heavy chain gene with leader 
sequence and complete constant region or Fc portion, 


were PCR amplified with the primer sets VH3-NS-R 
and CH1-XN-F or VH3-NS-R and CH2-XS-F (Table 
1) from two opposite ends using the PUC18-heavy 
chain vector as a template. The resulting PCR 
products omitted the VH or VH-CH1 region of the 
original heavy chain fragment, they were self ligated 
and transformed into E. coli DH5r competent cells. 
Above steps resulted in two transfer vectors con- 
taining heavy chain expression cassettes: l.PUC-H- 
Nhel, which contains the IgGl heavy chain leader 
sequnece, the VH in-frame cloning sites Xhol and 
Nhel and the complete constant region; and 2. PUC- 
H-Spel, which contains the same leader sequence, 
the Fd in-frame cloning sites Xhol and Spel and the 
Fc region. 

To obtain the vector pAc-K-CH3 or pAc-L-CH3, 
the original Hindlll site in the backbone vector 
pAcUW51 was mutated by assembly PCR with 
primers pAc-Bam-F, HindMuF, HindMuR, pAc-Sca- 
R (Table 1). In this step, the Hindlll site was 
replaced by the sequence GAGTTC. The purified 
PCR products encoding kappa or lambda signal 
sequences followed by VL-CL fragment cloning site 
Sacl and EcoRI were ligated into the Bglll site of 
the backbone vector pAcUW51, resulting in two 
transfer vectors: pAc-K-Leader and pAc-L-leader. 
The constant region fragment genes of kappa or 
lambda chains were then cloned into the above 
transfer vectors pAc-K-leader or pAc-L-leader 
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through SacI and EcoRI, resulting in two new 
transfer vectors: pAc-K-CK and pAc-L-CL. The 
heavy chain expression cassette was cleaved from 
the vector PUC-H-Nhel with BamHI and cloned into 
pAc-K-CK or pAc-L-CL vectors, to finally yield the 
two expression vectors pAc-K-CH3 and pAc-L-CH3. 
In a similar way, the DNA fragment of the heavy 
chain expression cassette was obtained from the 
respective vector PUC-H-Spel and cloned into the 
BamHI site of pAc-K-leader or pAc-L-leader, finally 
to yield the vectors pAc-K-Fc and pAc-L-Fc. All 
mutations, insertions or deletions of the resulting 
vectors were controlled by DNA sequencing by 
using the ABI auto-sequence kit and the ABI 310 
automated capillary sequencer (Perkin Elmer, 
Langen, Germany). 


2.4. Insertion of V region or Fab genes 

Various human or mouse recombinant IgG Fab 
fragment genes derived from phage display antibody 
libraries or mouse hybridoma cells were used to test 
the efficiency of the universal vector system. Ac- 
cording to the sequence information of each in- 
dividual antibody genes, the PCR amplification of 
the variable region of human or mouse IgG Fab 
antibody genes were performed with forward primers 
reported by Kang et al. (1991) and the following 
reverse primers: VH-NheIR:5'-TGG GCC CTT GGT 
GCT AGC TGA GGA GAC GGT GACC-3'; VL- 
HindHIR: 5' -GAC GGT AAG CTT GGT CCC TCC- 
3'; VK-HindHIR: 5'-GGATCTCAAGCTTGGT- 
CCCCT-3'; Mu-HindIIIRl: 5'-CAG CTC CAA GCT 
TGG TCC CAC CAC CGAA-3 and Mu-HindIIIR2 
5'-TT CAG CTC AAG CTT GGT CCC GAA CG- 
3'. The PCR products were digested with Xhol and 
Nhel (from heavy chain DNA) and SacI and Hindlll 
(from light chain DNA) and cloned into the vector 
pAc-K-CH3 or pAc-L CH3 in accordance to the 
original kappa or lambda type. The Fab genes 
inserted into vectors pAc-K-Fc or pAc-L Fc were 
directly cleaved from the pComb3 phagemid based 
Fab clones (Barbas et al., 1991) with Xhol and Spel 
or SacI and Xbal. The respective light chain genes 
were subcloned into transfer vector pEG5F (Prom- 
ega, Mannheim, Germany), and subsequently cloned 
into the SacI and EcoRV sites of above two vectors. 


2.5. Preparation of recombinant baculovirus for 
the expression of intact human or mouse I human 
chimeric IgG 

Recombinant baculoviruses were prepared by 
homologous recombination using the baculo-Gold 
transfection kit (Phamingen, Heidelberg, Germany) 
according to the instructions given by the supplier. 
Recombinant baculovirus was harvested 4-5 days 
after transfection from supernatants of SF9 cells 
culture medium, and subsequent plaque purification 
was performed to obtain pure and high titer recombi- 
nant virus. Intracellular heavy and light chain expres- 
sion in the insect cells was tested by immuno- 
florescence using FITC conjugated anti-human Fc 
and anti-human Fab antibodies. Secreted recombi- 
nant human IgG antibodies were detected in the 
supernatants of infected SF9 cells by a conventional 
capature ELISA, using goat anti-human IgG Fab as a 
capture reagent and HRP conjugated anti-human Fc 
for detection. 

2.6. Expression and purification of baculovirus 
expressed IgG antibody 

SF9 cells or H5 cells were infected with the 
recombinant viruses expressing various IgG anti- 
bodies at an m.o.i of 10, and grown in serum-free 
medium (GibcoBRL, Karlsruhe, Germany), incu- 
bated in T75 flasks, at 27° until approximately 50- 
60% of dead cells were observed (approx. 4-5 days 
postinfection). The supernatants of recombinant 
baculovirus infected insect cells were harvested and 
clarified by centrifugation, filtered through 0.45 (jum 
filters and applied to a Protein G-Sepharose CL-4B 
(Pharmacia, Braunschweig. Germany). The IgG frac- 
tion was eluted with 1.0 M glycine-HCL, pH 2.7, 
and neutralized with 1 M Tris, then applied to a 
desalting column eluted with 0.02 M sodium phos- 
phate, pH 7.0. IgG concentrations were estimated 
according to Harlow and Lane (1988), and ajusted to 
a concentration of 200-500 ug/ml. 

2.7. SDS-PAGE and western-blot 

All expressed and purified IgG samples were 
analyzed under reducing and non-reducing condition 
on 10% polyacrylamide SDS-gels. SDS-PAGE was 
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carried out according to Lammli (1970). Total 
protein staining was achieved with Coomassie Bril- 
liant Blue R250 (Serva, Heidelberg, Germany). 
Immunoblots were performed essentially according 
to Towbin et al. (1980) and blocked with PBS-5% 
milk for 2-3 h before incubation with a mixture of 
HRP conjugated goat anti-human IgG and HRP 
conjugated goat anti human kappa+lambda chain 
antibodies. TMB Stabilized Substrate for HRP 
(Promega, Madison, USA) was used for visualizing 
bound enzymatic activity. 

2.8. Immunoflorescence 

For the detection of IgG antibodies expressed in 
SF9 cells, the cells cultured in T25 flasks were 
infected with recombinant viruses for 3-4 days. The 
cells were washed and suspended in PBS prior to 
fixation on 10 well slides with acetone at room 
temperature for 10 min. FITC conjugated anti-human 
Fab and Fc antibodies were subsequently incubated 
with the slides at 37°C for 30 min and washed 3 
times with PBS prior to embedding for microscopic 
examination. To determine the specificity of the 
expressed IgG antibody, the immunofluorescence test 
for hantavirus infection of Progen (Heidelberg, Ger- 
many) was used. The supernatants of recombinant 
baculovirus infected SF9 cells containing IgG anti- 
bodies to hantavirus nucleocapsid protein (Liang et 
al., 1997b) were used as control. Bound antibodies 
were detected with FITC conjugated anti-human Fab 
and human Fc as described above. 

2.9. Quantitative IgG ELISA 

The Sf9 cells or H5 cells in T75 flasks were 
infected with 10 7 baculoviruses and incubated at 27° 
for 4-5 days until to 50-60% of the cells have died. 
The infected supernatants were harvested and tested 
by sandwhich ELISA. Commercialy available 
purified human IgG was used for a calibration curve 
by preparing a two fold dilution series in PBS. 
ELISAs were performed by coating anti-human Fab 
antibody onto a 96-well ELISA plate at 4°C over- 
night. Then, the wells were incubated with culture 
supernatants or the purified IgG dilutions. All wash- 
ings were done with PBS. Bound immunoglobulins 
were detected by incubation with HRP conjugated 


anti human IgG Fc antibody and visualised using 
soluble TMB substrate (Promega, Madison, USA)). 
The O.D values at 450 nm of each sample was 
compared to the IgG calibration curve to calculate 
the protein concentration. 


3. Results 

3.1. Design and construction of baculovirus human 
IgG expression cassette vectors 

A set of universal baculovirus expression cassette 
vectors (pAc-K-CH3, pAc-L-CH3, pAc-K-Fc and 
pAc-L-Fc) for the production of human IgG has been 
designed and constructed as illustrated in Fig. 1 . The 
cassette vectors were specifically designed for direct 
insertion of the antibody Fv or Fab genes selected 
from phage display libraries and linking them to 
human IgG constant region, resulting in intact IgG 
expression vectors with heavy and light chain ex- 
pression elements in one construct. The heavy chain 
elements are under control of the polyhedrin promot- 
er, followed by the the 66 bp authentic IgG signal 
sequence DNA from IgGl subgroup VHIII family 
and by the mutated in-frame cloning sites Xhol and 
Nhel for cloning scFv VH genes (Fig. 1A) or Xhol 
and Spel for cloning Fab Fd genes (Fig. IB). The 
whole constant region gene of human IgGl (Fig. 1A) 
or the Fc region gene of human IgGl(Fig. IB) are 
located further downstream of the signal sequence 
DNA and these cloning sites. Having the opposite 
orientation compared to the heavy chain operon, in 
the same vector, the light chain elements are under 
the control of P10 promoter, followed by the 69 bp 
authentic signal sequence DNA of human kappa or 
lambda chain (Fig. 1A,B). This is followed by the 
mutated in-frame cloning site SacI and Hindlll for 
cloning of V lambda or V Kappa region genes; 
followed by the constant region genes of Lambda or 
Kappa chains (Fig. 1A). The original Hindll site in 
the vector pACUW51 was mutated in both vectors 
pAc-K-CH3 or pAc-L-CH3. In the vectors pAc-K-Fc 
and pAc-L-Fc, the cloning sites SacI and EcoRV 
were introduced adjacent to the signal sequences to 
allow cloning of Kappa or Lambda chain Fab genes, 
(Fig. IB). 

As the cassette vectors were designed with the 
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pAc-K-CH3/pAC-L-CH3 


H Insertion region of pAc-K-CH3 and pAC-L-CH3 


*] 


Vk Inaertlon region of pAc-K-CH3 
agatotaa 


Vk Insertion region of pAC-L-CH3 


aaaagottaaogta . .321bp. . tagttatagatatcagatot 


llghuhsln 


pAc-K-Fc/pAC-L-Fc 


Fd Insertion region of pAc-K-Fc or pAC-L-Fo 


b chain Insertion region of pAC-L-Fc 

oaaaatggooatgg. .55bp. . goooagtctgte 


Fig. 1. Cassette baculo virus vectors for human IgG expression. A, vectors for the cloning of V region DNA obtained from scFv phage 
display systems; B, vectors for the cloning of Fab region DNA obtained from Fab phage display systems. The vector backbone in all cases 
(outside Bgl II/BamHI) is pAcUW51. Abbreviations: aa, amino acid number; C(x), constant human immunoglobulin regions; H, K or L 
leader, sequence coding for the respective human immunoglobuline signal peptides. The genetic elements are not drawn to scale. 
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purpose of cloning different forms of antibody genes 
from phage display libraries, the insertion sites for 
cloning of V genes in the vectors were chosen on the 
basis of their low cutting frequencies in human 
antibody V region genes (Persic et al., 1997) and the 
concern of the amino acid sequence conservation at 
the sites. The light and heavy chain cloning sites of 
the pComb3 phage display system (Barbas et al., 
1991) were introduced into the vectors. Therefore, 
Fab genes from this phagemid could be directly 
cloned into the vectors pAc-K-Fc or pAc-L-Fc. 
Other forms of antibody fragments derived from 
different sources including human or rodent phage 
display libraries (McCafferty et al., 1990; Pope et al, 
1996; Breitling and Diibel,. 1997) could be inserted 
after a PCR to introduce the respective sites. As Fig. 
1 indicated, the heavy chain 5' cloning site Xhol 
must start at 72 bp, at amino acid (aa) position 24 
since ATG of the IgGl gene amplified from mRNA, 
with CTC=Leu; the 3' cloning site Nhel must start 
at the first amino acid of the constant region, with 
GCT=Ala; Spel must start at 318 bp (aal06 of 
constant region, with ACT=Thr). The kappa or 
lambda 5' cloning site Sad must start at 69 bp 
(aa23) to retain the ATG of kappa or lambda genes 
from mRNA; the 3' cloning site Hindlll must start at 
18 bp (aa6, the first bp of framework 4, with AAG= 



Lys). The mutations introduced in all four vectors 
have been confirmed by DNA sequencing. 


3.2. Expression in insect cells of functional 
recombinant human IgG antibodies derived from 
antibody variable or Fab genes 

To check the function of the new vectors, various 
V region or Fab antibody genes were cloned into the 
vectors, comprising the VH, Vk and VX. genes of 
antibodies against hantavirus, hepatitis A virus and 
rabies virus (unpublished data). VH fragments de- 
rived from display libraries were cloned into Xhol 
and Nhel sites, and Vk or VX were cloned into Sad 
and Hindlll sites of the vectors pAc-K-CH3 or pAc- 
L-CH3, respectively, in accordance with the original 
chain type. Fab antibody genes obtained from phage 
libraries or mouse hybridomas were directly cloned 
into the Xhol and Spel sites (Fd) and SacI and 
EcoRV sites of the vectors of pAc-K-fc or pAc-L-Fc. 

After plaque purification of the respective recom- 
binant baculoviruses, the IgG antibodies expressed in 
Sf9 cells and H5 cells were analysed after infection. 
The strong cytoplasmic flurescence of infected Sf9 
insect cells detected with FITC conjugated anti- 
human Fc and Fab antibodies (Fig. 2) demonstrated 



Fig. 2. Immufloresence assay of recombinant human IgG expressed in SF9 cells. A, detection by FITC conjugated anti-human IgG Fc 
antibodies. B, detection by FITC conjugated anti-human IgG Fab antibodies. 
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that a human antibody Fab and Fc regions were well 
expressed. 

To determine whether the recombinant human IgG 
immunoglobulins expressed either from pAc-K(L)- 
CH3 or pAc-K(L) vectors were correctly processed, 
assembled and secreted from the recombinant 
baculoviris infected insect cells, supernatants of 
cultures were collected. The recombinant immuno- 
globulins were recovered by binding to protein G- 
sepharose. Heavy and light chains of the expected 
sizes were observed under reducing condition of 
SDS-PAGE (Fig. 3a) and verified by incubating with 
HRP conjugated anti-human IgG Fc and anti human 
light chain antibodies on immunoblots (Fig. 3c). The 


purified recombinant human IgG molecules showed 
identical migrating size when compared to human 
blood derived IgG molecules under non-reducing 
conditions in SDS-PAGE electrophoresis (Fig. 3b). 
Similarily, bands with the expected molcular mass of 
IgG heavy and light chains were detected under 
reducing conditions. These results demonstrated that 
the recombinant human IgG antibodies produced in 
insect cells were correctly assembled and secreted 
into the recombinant baculovirus infected insect cell 
culture mediums as heterodimeric H2L2 immuno- 
globulins. 

The expression levels provided by the universal 
IgG expression vectors were analyzed by using 5 



reducing 


non-reducing 


Purified IgG protein 

Fig. 3. SDS-PAGE and immunoblot analysis of recombinant human IgG produced by SF9 cells. A, B, coomassie blue stain, C, 
Immunoblot. Antibodies purified from the supernatants of recombinant baculovirus infected SF9 cells were analysed by SDS-PAGE using 
reducing (A) or nonreducing (B) conditions. Lane A-l, size Marker, lane A-2, IgG from vector pAc-L-CH3; lane A-3, IgG from vector 
pAc-L-Fc. Lane B-l, IgG from vector pAc-L-CH3, lane B-2, IgG from vector pAc-L-Fc. Lane B-3, Control (Human IgG franction obtained 
from Sigma, Deisenhofen, Germany). C, Immunoblot stainig with HRP conjugated anti-human IgG (Lane C-l, IgG from vector 
pAc-L-CH3, lane C-2, IgG from vector pAc-L-Fc). 
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human antibodies derived from phage antibody 
libraries which recognized hantavirus glycoprotein 
Gl, hantavirus nucleocapsid protein (Liang et al., 
1997b), hepatitis AVP1 proteins (Chao et al., 2000) 
and rabies virus (unpublished data). The ELISA to 
quantify the yield of secreted antibodies was stan- 
dardized by comparing the titer of human IgG 
antibodies in the culture medium with human IgG 
immunoglobulin from blood donors. The expression 
levels of secreted human antibodies was around 6- 
18 mg/1 (Fig. 4). 

3.3. Functional analysis of the recombinant human 
antibodies expressed in insect cells 

The function of the 5 baculovirus/insect cell 
expressed recombinant human antibodies was con- 
firmed by specific binding to their related target 
antigens or by neutralizing target viruses. All IgGs 
showed the functional activity of their maternal 
antibody fragments. For example, we compared a 
human Fab fragent to hantavirus nuleocapsid protein 
obtained by phage display and the human IgG 



Antibody clone 

Fig. 4. Yields of recombinant human IgG antibodies secreted into 
the cell culture medium of High Five insect cells infected by 
recombinant baculoviruses. The production levels of various 
clones were determined by comparison to a calibration curve 
using human immunoglobulins of known concentration. Samples: 
lOOIgG, R-IgC t H i Gl (vector pAc-L-CH3); 23IgG, 
R-IgG to Hantavirus Gl (vector pAc-L-Fc); 30IgG, R-IgG to 
Hantavirus NP (vector pAc-K-CH3); HAV9 and HAV16, R-IgGs 
to Hepatitis A (vector pAc-L-Fc); RV, R-IgG to Rabies virus 
(vector pAc-K-Fc). 


antibody derived from it for their binding specificity 
on hantavirus infected Vero-E6 cells. The baculo- 
virus expressed human IgG antibody showed strong 
binding (Fig. 5a) with the typical pattern obtained for 
intracellular virus stains, as demonstrated by the 
maternal Fab fragment (Fig. 5b). 


4. Discussion 

To achieve rapid expression of complete human 
immunoglobulins, we have designed and constructed 
a set of universal human antibody expression vectors 
with authentic IgG light and heavy chains signal 
sequences, mutant in-frame cloning sites and human 
IgG constant or Fc regions, which allow facile 
cloning of the heavy and light chain genes of Fab or 
scFv antibodies selected from phage display libraries 
or hybridomas and express them as intact IgG 
antibodies in recombinant baculovirus/insect cells 
system. The functions of the recombinant human 
(Fig. 4) or human-mouse chimeric antibodies (data 
not shown) with our cassette vector system were 
verified by the specific binding to their target an- 
tigens. 

A quick procedure for expressing intact human 
IgG antibodies derived from Fab or scFv genes 
selected from phage display libraries has turned out 
to be essential for various methods of functional 
characterization, in particular in animal models. 
Prokaryotic expression systems offer the quickest 
solution for the expression of foreign genes; but they 
cannot produce and correctly fold the entire IgG 
diheterotetramer. This can only be achieved in 
eukaryotic systems. Mammalian cell expression sys- 
tems have been widely used to produce functional 
IgG. However, the yield of transient expression 
systems is usually low, and to achieve stable expres- 
sion requires time consuming selection procedures. 
Baculovirus/insect cells systems, in contrast, com- 
bine the advantages of time saving, correct protein 
folding and stable and large scale expression 
(Hasemann and Capra, 1990) 

The intact IgG expression cassette vectors that we 
addressed in this paper are based on the backbone 
vector pAcUW51, which has been successfully used 
before to express an IgG to hantavirus glycoprotein 
Gl which retained the neutralizing activity of its 
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maternal human hybridoma monoclonal antibody 
(Liang et al, 1997b). The universal cassette de- 
scribed in this study, however, by encoding both 
heavy and light chains under the control of different 
back-to-back promoters avoid the disadvantage of 
previous separate expression vector systems for H 
and L chain. These required significant efforts to 
adjust the expression levels of heavy and light chains 
by evaluating the optimal ratio of the different 
recombinant baculovirus titers, and it required two 
separate steps of baculovirus recombination. As 
these vectors will be used with the purpose of rapid 
and simple cloning of Fab or scFv antibody genes 
from phage display libraries, we paid particular 
attention to the design of the restriction cloning sites. 
Most of the introduced in-frame unique restriction 
sites (Xho I and Nhel for the insertion of VH region 
and SacI and Hindlll for the insertion of VL region) 
in the vectors pAc-K-CH3 or pAc-L-CH3 do not 
result in an amino acid exchange, except for the 
change to introduce the SacI site resulting in a 
mutation from a neutral valine to a negatively 
charged glutamic acid residue. Although some data 
has shown that FR1 residues can considerably in- 


fluence antigen binding and antibody affinity (Xiang 
et al., 1991), previous investigations have demon- 
strated that the introduction of this particular muta- 
tion did not influence the antibody affinity (Bender et 
al., 1993). In the heavy chain cassette of the vectors 
pAc-K-Fc or pAc-L -Fc, the introduction of the Spel 
site resulted in an amino acid change from histidine 
to serine at the beginning of FR2 region, thus 
decreasing the hydrophilicity of the amino acid at the 
site. This change, however, did not influence func- 
tionality of all three antiviral neutralizing antibodies 
tested in this study. For example, Fab genes of a 
neutralizing antibody to hepatitis A selected by 
phage display were directly cloned into the vector 
pAc-L-Fc, which contains the SacI mutation in the 
light chain and the Spel mutation in the heavy chain 
gene. The resulting recombinant human IgG anti- 
body maintained the specificity and showed a better 
neutralizing activity for hepatitis A virus and higher 
binding affinity, reaching picmolar dissociation con- 
stants (unpublished data). This increase of apparent 
affinity is in accordance with the increase of avidity 
expected from the bivalent binding of the IgG when 
compared to the monovalent Fab ragment. In gener- 
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By analyzing the human antibody repertoire in terms of structure, amino 
acid sequence diversity and germline usage, we found that seven V H and 
seven V L (four Vk and three VX) germline families cover more than 95 % 
of the human antibody diversity used. A consensus sequence was 
derived for each family and optimized for expression in Escherichia coli. 
In order to make all six complementarity determining regions (CDRs) 
accessible for diversification, the synthetic genes were designed to be 
modular and mutually compatible by introducing unique restriction 
endonuclease sites flanking the CDRs. Molecular modeling verified that 
all canonical classes were present. We could show that all master genes 
are expressed as soluble proteins in the periplasm of E. coli. A first set of 
antibody phage display libraries totalling 2 x 10 9 members was created 
after cloning the genes in all 49 combinations into a phagemid vector, 
itself devoid of the restriction sites in question. Diversity was created by 
replacing the V H and V L CDR3 regions of the master genes by CDR3 
library cassettes, generated from mixed trinucleotides and biased towards 
natural human antibody CDR3 sequences. The sequencing of 257 mem- 
bers of the unselected libraries indicated that the frequency of correct and 
thus potentially functional sequences was 61 %. Selection experiments 
against many antigens yielded a diverse set of binders with high affi- 
nities. Due to the modular design of all master genes, either single bin- 
ders or even pools of binders can now be rapidly optimized without 
knowledge of the particular sequence, using pre-built CDR cassette 
libraries. The small number of 49 master genes will allow future 
improvements to be incorporated quickly, and the separation of the fra- 
meworks may help in analyzing why nature has evolved these distinct 
subfamilies of antibody germline genes. 
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Introduction 

The selection of antibody fragments from 
libraries using enrichment technologies such as 
phage-display (Smith & Scott, 1993), ribosome dis- 
play (Hanes & Pliickthun, 1997), bacterial display 
(Georgiou et ah, 1997) or yeast display (Kieke et ah, 
1997) has proven to be a successful alternative to 
classical hybridoma technology (for recent reviews, 
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see Winter et ah, 1994; Hoogenboom et al, 1998; 
Spada et al, 1997; Rodi & Makowski, 1999). Phage 
display was developed first (Smith, 1985) and has 
been improved the furthest, especially in the anti- 
body field. It is likely that conventional hybridoma 
technology may be superceded by a combination 
of these technologies, since these approaches are 
faster, involve no animals, yield antibodies of at 
least comparable affinities and work also with self- 
antigens or toxic molecules (Hoogenboom et ah, 
1998). The selection of antibodies must start from 
an initial, highly diverse library. Here, we describe 
the construction of such a library by total gene syn- 
thesis, based on a structural analysis of the human 
antibody repertoire. 

Human antibodies are of particular interest, 
since they are considered to be valuable for thera- 
peutic applications (Carter & Merchant, 1997), 
avoiding the HAMA (human anti-mouse antibody) 
response frequently observed with rodent anti- 
bodies. Although it has been demonstrated in 
many examples (DalTAcqua & Carter, 1998) that 
chimerization or humanization of rodent anti- 
bodies through protein engineering can success- 
fully retain the affinity and specificity of the 
parental molecule (Baca et al, 1997), this strategy is 
time-consuming and still does not yield fully 
human antibodies. 

Previous phage-display libraries of human anti- 
bodies have been generated from immunized 
donors (Barbas & Burton, 1996), germline 
sequences (Griffiths et al., 1994) or, most recently, 
naive B-cell Ig repertoires (Vaughan et al, 1996; 
Sheets et al, 1998; De Haard et al, 1999). Selection 
from these libraries by phage-display has yielded 
human antibodies against numerous haptens, pep- 
tides and proteins. While these libraries have all 
been successful, their uncontrollable composition 
and problems with the subsequent expression of 
the antibodies (see below) and restricted engineer- 
ing possibilities made it desirable to use a complete 
protein engineering approach to solve the problem. 

The success of obtaining high-affinity antibodies 
is generally assumed to be related to the initial 
library size (Perelson, 1989), even though the exact 
relation may not be tractable by theoretical con- 
siderations, as it may be antigen-dependent. Con- 
sequently, successful "one-pot" libraries have all 
been large (Griffiths et al, 1994; Vaughan et ah, 
1996; Sheets et al, 1998; De Haard et al, 1999). It is 
important to note that, obviously, only the func- 
tional library size, i.e. the number of correctly 
assembled clones without any frameshift, stop 
codon or deletion, will contribute to the diversity. 
This number can be orders of magnitude below 
the apparent diversity usually reported, which is 
normally obtained by counting the numbers of 
transformants. 

It has been shown that the Escherichia coli 
expression yields of functional antibody fragments 
can vary dramatically, even if the antibody gene is 
expressed in the same format, vector and 
expression strain. This effect has been shown to 


depend on cellular folding, which in turn is influ- 
enced by the antibody sequence and can be suc- 
cessfully improved by protein engineering 
(Knappik & Pluckthun, 1995). There is growing 
evidence that critical amino acid residues located 
in turns at the surface or at the variable-constant 
(V-C) interface are responsible for the misfolding, 
aggregation or even toxic effects on the E. coli cells, 
hence leading to poor expression yields. Mutating 
those residues improved expression titers several- 
fold, without adversely affecting the binding prop- 
erties (Deng et al, 1994; Knappik & Pluckthun, 
1995; Ulrich et al, 1995; Jung & Pluckthun, 1997; 
Nieba et al, 1997; Forsberg et al, 1997). As phage 
display depends on correctly folded antibodies, 
there is some selection against poor folders (Deng 
et al, 1994;- Jackson et al, 1995; Jung & Pluckthun, 
1997; Bothmann & Pluckthun, 1998), and thus the 
functional library size will be decreased. However, 
the selection is clearly not stringent enough to 
secure that all molecules selected from a phage dis- 
play library will have acceptable folding proper- 
ties. Thus, to maintain diversity and secure 
reasonable expression properties of the selected 
molecules, it would be advantageous to create anti- 
body libraries starting from well-expressed frame- 
works. While such approaches have been reported 
(Pini et al, 1998; Jirholt et al, 1998), only single fra- 
meworks have been used in these attempts, and 
consequently, the structural diversity does not 
approach that of other naive libraries. 

The humoral immune system, however, does not 
work by the "single-pot" approach (Nissim et al, 
1994), but rather uses an evolutionary strategy. 
The initial, antigen-independent variability is first 
generated during B-cell development by gene 
rearrangements (V(D)J-joining), leading to more 
than lCr different molecules at any one time in a 
human being (Winter, 1998). After a B-cell is acti- 
vated, the antigen-driven process of somatic 
mutation is initiated (Rajewsky, 1996), and remark- 
able improvements in binding can be found. It has 
been shown that mutations occurring in CDRs 1 
and 2 are preferentially selected (Wagner & 
Neuberger, 1996; Ignatovich et al, 1997; Green et al, 
1998), as their diversity in the initial germline var- 
iants is much more limited than that of the CDR3s 
(Tomlinson et al, 1996). The design of an artificial 
library should make it convenient to follow this 
same approach. Indeed, previous experiments with 
peptides (Cwirla et al, 1997), RNA-aptamers (He 
et ah, 1996) and antibodies (Schier et al, 1996a; 
Hanes et al, 1998) have shown that the evolution- 
ary approach and, in the case of antibodies, CDR 
walking (Yang et al, 1995; Schier et al, 1996a; Wu 
et ah, 1998) can dramatically improve affinities. 
However, in the absence of suitably engineered 
genes, such an optimization can be extremely 
laborious. 

The human antibody germline repertoire has 
recently been completely sequenced. There are 
about 50 functional V H germline genes located on 
chromosome 14 (Tomlinson et ah, 1992; Matsuda 
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& Honjo, 1996), which can be grouped into six sub- 
families according to sequence homology. About 
40 functional V L kappa genes comprising seven 
subfamilies are located on chromosome 2 (Cox 
et al, 1994; Barbie & Lefranc, 1998), and about 30 
functional V L lambda genes grouped into ten sub- 
families can be found on chromosome 22 (Williams 
et al, 1996; Kawasaki et al, 1997; Pallares et al, 
1998). The groups vary in size from one member 
(e.g. V H 6 and Vk4) to up to 22 members (V H 3), and 
the members of each group share a high degree of 
sequence homology. By comparing rearranged 
sequences of human antibodies with their germline 
counterparts we (this work) and others (Cox et al, 
1994; Ignatovich et al, 1997) have found that many 
human germline genes are never or only very 
rarely used during an immune response. 

In structural terms, the V H and V L domains com- 
prising the antigen binding Fv moiety (see Figure 1) 
share a common fold that, in its central portions, is 
almost perfectly superimposable, even when frag- 
ments from different species are compared 
(Chothia et al, 1998). Larger differences are 
observed only in the conformation of the CDRs, 
and it has been shown in a series of studies 
(Chothia & Lesk, 1987; Chothia et al, 1989; Al- 
Lazikani et al, 1997) that all CDRs except V H 
CDR3 adopt only a few distinct conformations. 
Hence the repertoire of conformations is limited to 
a relatively small number of discrete structural 
classes, depending on both the CDR length and the 
so-called canonical amino acid residues (Chothia & 
Lesk, 1987). 

Here, we report the design, construction and 
analysis of a novel human antibody library concept 
designated HuCAL (Human Combinatorial Anti- 
body Libraries). Each of the human V H and V L 
subfamilies that is frequently used during an 
immune response is represented by one consensus 
framework, resulting in seven HuCAL master 
genes for heavy chains and seven for light chains, 
and thus 49 combinations. All genes were made by 
total synthesis, thereby taking into consideration 
codon usage, unfavorable residues that promote 
protein aggregation as well as unique and general 
restriction sites flanking all CDRs, leading to mod- 
ular genes that contain readily accessible CDRs 
and can be easily converted into different antibody 
formats. 

A first set of antibody libraries based on the 
HuCAL concept was created by randomizing both 
the V H and V^ CDR3 encoding regions of the 49 
master genes using trinucleotide cassette mutagen- 
esis (Virnekas et al, 1994), which leads to high- 
quality libraries. The cassettes were designed such 
that the naturally occuring diversity was covered, 
both in terms of length and amino acid compo- 
sition. The final HuCAL antibody libraries 


t http://immuno.bme.nwu.edu/ 
t Now available at VBASE, http://www.mrc- 
cpe.cam.ac.uk/imt-doc/public/INTRO.html 


(HuCAL version 1) were extensively characterized 
by sequencing, expression behavior and numerous 
selection experiments against a wide variety of 
antigens. 


Results 

Analysis of the human antibody repertoire 

Sequence analysis 

Amino acid sequences from variable domains of 
human immunoglobulins were collected from 
Kabat (Kabat et al, 1991; Johnson et al, 1996;f) and 
Genbank (Benson et al, 1997) and incorporated 
into three databases, V heavy chain (V H ), V kappa 
(Vk) and V lambda (VX), and aligned, using the 
Kabat numbering system. For each of the three 
chain types, rearranged sequences were collected 
whenever more than 70 positions had been deter- 
mined, giving 386, 149 and 675 entries for Vk, VA. 
and V H , respectively, at the time of library design. 
Similarly, all germline sequences were collected 
(48, 26 and 43 entries for Vk, VX and V H , respect- 
ively), as the complete locij (see Cook & 
Tomlinson, 1995), had not been published at that 
time. Finally, all known D and J sequences were 
collected. Although the design was started before 
the complete germline repertoire was known, the 
availability of the whole repertoire and a larger 
number of rearranged sequences would not have 
influenced the library design, which was demon- 
strated by repeating the analysis using the com- 
plete germline repertoire and a larger database 
(846, 413 and 1201 entries for Vk, VX and V H , 
respectively) of human rearranged sequences (see 
Figure 2). 

The binning into families is somewhat arbitrary, 
depending on how the homology cutoff between 
families is defined. Initially, for Vk, seven families 
were established, VX was divided into eight 
families and V H into six families. The single V H 
germline gene of the V H 7 family (van Dijk et al, 
1993) was included in the V H 1 family, since the 
genes of the two families are highly homologous. 
Upon more detailed analysis, regarding canonical 
CDR conformations and canonical framework resi- 
dues as well as gene usage (see below), the number 
of families was raised to seven for V H , but was 
reduced to four for Vk and three for VA,. 

To further examine the concept of constructing 
HuCAL using the equidistant partitioning of 
sequence space as an efficient means to engineer 
library diversity, it was important to test the usage 
of the structural groups in actual rearranged genes 
of antibodies. By counting the number of differ- 
ences between each rearranged entry and each 
germline sequence, the nearest germline counter- 
part was identified for each rearranged sequence. 
Altogether, 532 (79 %) V H sequences and 474 (86 %) 
V L sequences (343 Vk and 131 VX) could be clearly 
assigned to germline counterparts. 
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Figure 1. A representation of V L and V H structures, consensus hydrogen bonding pattern and antigen contacts. 
Residues are color-coded (from white to red) to indicate the reduction of residue solvent-accessible surface upon anti- 
gen binding, averaged over 52 liganded structures taken from the Brookhaven protein structure database (http:// 
www.rcsb.org/pdb/). Residue numbering is according to Chothia et al. (1992), Tomlinson et al. (1995) and Williams 
et al. (1996), and CDR definitions conform to Kabat et al. (1991). Complementarity determining regions CDR1, CDR2 
and CDR3 are indicated by blue, green and pink coloring, and framework regions by gray underlays. 


Our results (see Table 1 and Figure 2) confirm 
the biased usage of human germline genes ana- 
lysed previously (Tomlinson et al, 1992; Cox et al, 
1994; Ignatovich et al, 1997). The V H germline gene 
usage was found to be restricted to about 12 genes 
from five sub-families, which are used in approxi- 
mately 80% of all cases. The V H 2 family is only 
rarely used. Only four of the Vk germline families 
were found to be used, and out of these only seven 
genes were used frequently (81%). The VX germ- 
line gene usage was found to be restricted to three 
families, which are used in 93% of all cases, and 
five genes from these three families were used 
most frequently (Table 1). We concluded that the 
vast majority (98 % of all V H , more than 99 % of all 
Vk and more than 93 % of all VX) of human anti- 
bodies are derived from only five V H and seven V L 
families (four Vk and three VX). Although the 
three germline genes of the V H 2 family are not fre- 
quently used, we decided to cover all six V H 
families with our consensus approach, and there- 
fore we included this family for further analysis. 


The strategy of the synthetic library approach 
was therefore to represent each family by one 
representative member, subject to verification of 
the structural consequence of the distribution of 
CDR conformations (see the next section). 

Structural analysis 

Despite their great variability in length and 
sequence, the conformation of the antigen binding 
loops, denoted CDR (complementarity determining 
regions), have been shown to adopt only a limited 
number of main-chain conformations, termed cano- 
nical structures (Chothia et al, 1989). The adopted 
structure depends on both the CDR length and the 
identity of certain key amino acid residues, both in 
the CDR and in the contacting framework, 
involved in its packing. The six V H , four Vk and 
three VX. germline families, as defined above from 
the dendrogram analysis, were therefore analyzed 
for the canonical structures of CDRs that they were 
predicted to encode, in order to define the structur- 
al repertoire covered by these families (Table 1). In 
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Table 1. Frequency of germline family usage and corresponding types of canonical structures 

Canonical structure Chosen HuCAL canonical 
Frequently used germline genes prediction structures 

Family usage 


Subfamily 




Usage (%) 







1-69 

DP-10 

6 


H2-2 


H2-2 

VH1 

19 

1-18 

DP-14 

4 

Hl-1 

H2-3 

Hl-1 

H2-3 



1-02 

DP-8 

4 





VH2 

2 




Hl-3 

H2-1 

Hl-3 

H2-1 



3-23 

DP-47 

12 


H2-1 



VH3 

34 

3-30.3 

DP-46 

5 

Hl-1 

H2-3 

Hl-1 

H2-3 



3-48 

DP-51 

3 


H2-4 





4-34 

DP-63 

5 

Hl-1 




VH4 

12 




Hl-2 

H2-1 

Hl-1 

H2-1 



4-59 

DP-71 

4 

Hl-3 






5-51 

DP-73 

16 





VH5 

19 




Hl-1 

H2-2 

Hl-1 

H2-2 
















































A3 

DPK15 


Ll-3 















A27 

DPK22 

29 

Il~2 




Vk3 

51 

L6 


11 

Ll-6 

L2-1 

Ll-6 

L2-1 



L2 

DPK21 

10 





Vk4 

10 

B3 

DPK24 

10 

Ll-3 

L2-1 

Ll-3 

L2-1 

Vk5-7 

0 




Ll-2 

L2-1 





lb 

DPL5 

13 

13 




VX1 

31 




14 

7 

13 

7 



lc 

DPL2 

11 







2a2 

DPL11 

18 





V12 

33 




14 

7 

14 

7 



2e 

DPL12 

11 





VX3 

29 

3r 

DPL23 

15 

11 

7 

11 

7 


12 7 

VX4-W 8 - - - 13 11 

14 12 


The human immunoglobulin germline subfamilies are listed together with their percentage usage as calculated by comparison 
with rearranged sequences. The percentage usage is determined from using the initial database of rearranged sequences with 1006 
entries. The percentage usage calculated from the updated database with 2460 entries is given in Figure 2. The most frequently used 
germline genes according to our analysis are also given (locus name as well as DP nomenclature, see Tomlinson et id. (1992) for V H , 
Cox et al. (1994) for Vk, and Williams et al. (1996) for VX) together with their corresponding usage (derived from analysis of the 
smaller database). For details of the calculation, see the text. The canonical conformations that are present in each subfamily are 
shown together with the canonical conformations that have been chosen for HuCAL design. The canonical structure nomenclature is 
according to Chothia et al. (1992) for V H , Tomlinson et al. (1995) for Vk, and Williams et al. (1996) for YX. 


the following, we will use the CDR definitions 
given by Kabat et al. (1991) (see also Figure 1) and 
the sequence numbering according to structural 
criteria defined by Chothia (Chothia et al, 1992; 
Tomlinson et al, 1995; Williams et al, 1996). 

The structural repertoire of the human V H 
sequences was previously analyzed in detail by 
Chothia et al (1992). In total, three conformations 
of CDR1 (Hl-1, Hl-2 and Hl-3) and five confor- 
mations of CDR2 (H2-1, H2-2, H2-3, H2-4 and H2- 
5) have been defined, and the observed combi- 
nations have led to the conclusion that almost all 
sequences have one of seven main-chain folds. For 
the highly diverse CDR3, which is encoded by the 
D and J-minigene segments and uncoded nucleo- 
tides (N-region diversity), structural families have 
been defined only very recently (Morea et al, 1998; 
Oliva et al, 1998), but structural predictions are not 


approaching the accuracy seen for the canonical 
folds of the other CDRs. 

All members of the V H 1 family encode the CDR1 
conformation Hl-1, but differ in their CDR2 con- 
formation: both the H2-2 and the H2-3 confor- 
mation were found in five germline genes. Since 
these two types of CDR2 conformations are 
defined by different types of amino acids at pos- 
ition 71 located in framework 3, we divided the 
V H 1 sub-family into two further sub-families: 
V H 1A with CDR2 conformation H2-2 (alanine at 
position 71) and V H 1B with the conformation H2-3 
(arginine at position 71). Upon model building (see 
below), we decided to include both gene types into 
the library design and to construct both a V H 1A 
and V H 1B master gene (see below). 

The members of the V H 2 family were all pre- 
dicted to have the conformations Hl-3 and H2-1 in 
CDR1 and CDR2, respectively. 
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VLk 


VLA 



VH 


Figure 2. Coverage of germline 
sequence space by HuCAL 
sequences. The protein sequences 
representing the human V L and V H 
germlines were taken from VBase 
(http:/ /www.mrc-cpe.cam.ac.uk/ 
imt-doc/public/INTRO.hrml) and 
aligned to the 14 HuCAL 
sequences. The Phylip (http://evo- 
lution.genetics. washington.edu/ 
phylip.html) and ClustalW (see 
ftp:/ / ftp.ebi.ac.uk/pub/software/ 
mac/clustalw/) phylogeny pro- 
gram packages were used to gener- 
ate separate unrooted trees for the 
V L kappa, V L lambda and V H 
sequences. Percentages indicate the 
fraction of rearranged sequences in 
the database that cluster within the 
different germline subgroups. For 
these calculations, we used a data- 
base of rearranged sequences with 
846 V L kappa, 413 V L lambda and 
1201 V H sequence entries. The 
difference to 100% in the case of 
V L kappa (0.5%) and lambda 
(16.4 %) is due to rarely used germ- 
line subfamilies that are not rep- 
resented by the HuCAL master 
genes. 


The CDR1 conformation of the V H 3 family mem- H2-3, H2-4). In these CDR2 conformations, the 
bers was predicted in all cases to be Hl-1, but canonical framework residue 71 is always arginine, 
three different types were found for CDR2 (H2-1, while the loop conformation of CDR2 is defined by 
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the residues 52a and 55 as well as the length vari- 
ation. Of the rearranged V H 3 sequences, 80 % were 
predicted to contain the H2-3 conformation. There- 
fore, the V H 3 family is best represented by a 
sequence containing the canonical conformations 
Hl-1 and H2-3, even though the more groove-like 
shapes of binding sites from the longer CDR-H2 
types may be introduced later by CDR shuffling. 

The Vh4 family members were predicted to con- 
tain three types of CDR1 conformations; namely, 
Hl-1, Hl-2 and Hl-3. The CDR1 canonical frame- 
work residue 26 was found to be glycine in all 
cases, and the CDR1 loop conformation is defined 
solely by residues located in this region. Since 62% 
of all rearranged Vh4 sequences contained the Hl- 
1 type of CDR1, this conformation was chosen for 
representing the Vh4 family. The CDR2 confor- 
mation of the V H 4 members was found to be H2-1 
in all cases. 

The two members of the V H 5 family were found 
to have the conformation Hl-1 and H2-2, and the 
single germline gene of the V H 6 family had the 
conformation Hl-3 and H2-5 in CDR1 and CDR2, 
respectively. Hence, in structural terms the 
majority of the frequently used members of the six 
V H families can be represented by seven sequences, 
since only the V H 1 family contained two types of 
canonical CDR folds defined by residues in the fra- 
mework region, and since V H 3 and V*h4 were 
decided to be represented by the most prevalent 
type. The canonical conformations not present in 
the design can be incorporated later during CDR 
library generation, since the key residues for those 
conformations are part of the CDR itself. 

The structural repertoire of the human Vk germ- 
line sequences was analyzed by Tomlinson et al. 
(1995). There are four conformations of the CDR1, 
which are defined by the length of the loop (7, 8, 
12 and 13 amino acid residues) and the nature of 
residues 2, 25, 29, 33 and 71. The CDR2 loop of 
human Vk domains is only three amino acid resi- 
dues long in all cases, and is predicted to adopt a 
single canonical fold. Most human Vk germline 
segments encode also a single conformation of the 
CDR3 loop, which is stabilized by the conserved 
ris-proline 95, but other conformations in 
rearranged sequences are possible due to the pro- 
cess of V-J joining and the potential loss of this 
proline residue. Since the CDR3 region was 
planned to be randomized for library generation, 
this area was not considered for the consensus 
sequence design. Hence, the structural repertoire of 
Vk domains is essentially defined by the confor- 
mation of the CDR1 region. All members of the 
VkI family contained a seven residue CDR1 (Ll-2), 
and the most frequently used members of the Vk2 
family contained a 12 residue CDR1 (Ll-4). The 
members of the Vk3 family contained either a 
seven (Ll-2) or an eight (Ll-6) residue CDR1. Since 
the canonical framework residues that additionally 
define the CDR1 conformation are identical in both 
cases, and since more than 60 % of the rearranged 
Vk3 sequences contained the CDR1 conformation 


Ll-6, this type was chosen for the consensus 
sequence. The single germline member of the Vk4 
family contained a 13 residue CDR1 (Ll-3). 

The structural repertoire of the human VX germ- 
line sequences was analyzed by Williams et al. 
(1996). The three families analyzed here encode 
identical conformations of the CDR2 loop. The 
CDR3 loop conformation is thought to be more 
highly variable, as there is some length variation 
and no ris-proline residue. Since this part was 
planned to be randomized for library generation, 
this area was not considered for the consensus 
sequence design. Although the CDR1 region of the 
VX1 family contains either 13 or 14 amino acid resi- 
dues, it is thought to adopt a single conformation, 
since the canonical key residues are conserved and 
the additional insertion of one residue has little 
effect on the overall structure (Chothia & Lesk, 
1987). A CDR1 length of 13 residues, which was 
found in more than 90% of all rearranged VX1 
sequences, was chosen for the VA.1 consensus. The 
members of the VA.2 and VX.3 families each encode 
a single defined type of CDR1 loop structure: the 
VX2 family encode a CDR1 loop of 14 residues, 
and the CDR1 loop length of the VX3 family is 11 
residues. 

In summary, from the eight different pairs of 
CDR1-CDR2 conformations encoded by the Vk 
and VX germline genes that are used frequently, 
seven could be represented by four Vk and three 
VX, consensus genes. The remaining CDR1 confor- 
mation (seven residue CDR1 loop in the Vk3 
family) is not defined by canonical key residues in 
the framework region and can therefore be inserted 
into the Vk3 consensus sequence during library 
generation. From the 11 different family-specific 
pairs of CDR1-CDR2 conformations found in the 
six V H germline families, seven could be covered 
by dividing the family V H 1 into two families 
(V H 1A and V H 1B). The remaining four pairs (two 
in the V H 3 and two in the Vh4 family) were either 
not found in rearranged antibody sequences or are 
defined by the CDRs themselves and will therefore 
have to be created during the construction of CDR 
libraries. Hence, the structural repertoire of 
the human V genes used could be covered by 49 
(7 V H x 7 V L ) different frameworks. 


Design of consensus frameworks 

The compilation of rearranged sequences was 
first divided into separate groups (four Vk, three 
VX and seven V H ) according to the germline 
families described above. These protein sequence 
databases were used to compute the consensus 
sequences of each subgroup. By using the 
rearranged sequences instead of the germline 
sequences for calculating the consensus, the con- 
sensus was automatically weighted according to 
the frequency of usage. Additionally, frequently 
mutated and highly conserved positions could be 
identified. 
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For the CDR1 and CDR2 regions, the consensus 
of rearranged sequences was replaced with the 
amino acid sequence of one of the germline 
sequences of the corresponding family. This pro- 
cedure removes any bias, as the CDRs of 
rearranged and mutated sequences are known to 
be mutated due to selection towards their particu- 
lar antigens. In the case of VX, a few amino acid 
exchanges were introduced in some of the chosen 
germline CDRs in order to avoid structural con- 
straints (position 30b in VX1 and positions 27 and 
34 in VX3, see Figure 3). 

To construct, assemble and verify the genes, as 
well as to obtain preliminary information on 
expression behavior, it was advantageous to first 
substitute the intended library of CDR3-H and 
CDR3-L cassettes with defined dummy sequences. 
We chose the sequences 89 QQHYTTPP and 
95 WGGDGFYAMDY for the V L and V H chains, 
respectively, which are derived from the antibody 
4D5 (Carter et al, 1992a) and are known to be 
favorable for antibody folding in E. coli (Jung & 
Pliickthun, 1997). Even though molecular modeling 
indicates that the omega loop from Vk is not ideal 
in a VX framework because of steric clashes, good 
expression behavior could still be obtained, 
demonstrating the robustness of the frameworks 
(see below). 

For the framework 4 regions, encoded by the J- 
elements, the consensus of the rearranged 
sequences in each family was calculated and found 
to be identical in all families of V H and V L (k and 
X). This shows that there is no correlation between 
V-usage and J-usage (Baskin et al, 1998). In all 
three cases, this consensus sequence was identical 
with at least one of the naturally occurring 
sequences encoded by joining elements, indicating 
that the sequence is able to exist. 

We have described, up to this point, only 
sequence information that was used to design the 
consensus sequences. It could therefore not be 
excluded that the consensus would lead to a mol- 
ecule whose sequence might "jump" between 
different naturally occurring sequences, thereby 
creating certain artificial combinations of amino 
acid residues that are located far away in the 
sequence but give rise to contacts in the three- 
dimensional structure. It was therefore essential to 
verify the sequences by structural means. Other- 
wise, the uncritical use of the algebraic consensus 
might obscure a hidden interaction between certain 
residues, which can occur only in certain combi- 
nations. While this approach may also keep resi- 
dues together that are linked only historically, it 
does safeguard against losing hidden long-range 
interactions (Saul & Poljak, 1993). As a first check, 


t see http://evolution.genetics.washington.edu/ 
phylip.html 
t ftp://ftp.ebi.ac.uk 

§http://www.biochem. ucl.ac.uk/ ~ roman/ 
procheck/ procheck.html 


the most homologous rearranged sequence for 
each consensus sequence was identified by search- 
ing against the compilation of rearranged 
sequences, and all positions where the consensus 
differed from this nearest rearranged sequence 
were inspected (see Materials and Methods). Fur- 
thermore, models for the seven V H and seven V L 
consensus sequences were built and analyzed 
according to their structural properties (see the 
next section). As a result of this analysis, the fol- 
lowing residues were exchanged (given is the pos- 
ition according to Kabat's numbering scheme, the 
substitution performed, and the name of the gene 
family): S^T (V„2), N L34 A (VkI), G L9 A, D L60 A, 
R L77 S (Vk3) and V L78 T (VX3). 

After the consensus protein sequences were 
designed, phylogenetic trees were built with the 
programs PHYLIPf and ClustalW} (Thompson 
et al., 1994). For this representation, we repeated 
the analysis of germline usage based on an 
updated database of rearranged human antibody 
sequences that was more than twice the size of the 
original database that we used for the design of 
the HuCAL sequences. Separate unrooted trees 
were built for the V l k, V l A, and V H sequences 
(Figure 2). This analysis illustrates the strategy 
adopted in the present study, which is an attempt 
to approach a more equidistant representation of 
sequence space, by having only one member for 
each of the main "branches" of the tree. By analyz- 
ing each consensus sequence as if it were a mem- 
ber of the germline, its position in the sequence 
map is indicated, and that it truly represents the 
family (Figure 2). 

Molecular modeling and analysis 

To obtain more information about the packing, 
CDR conformations and framework properties, all 
seven V H frameworks, all four Vk frameworks and 
the three VX frameworks were built via homology 
modeling. As a basis, a complete structural align- 
ment of the approximately 100 independent anti- 
body sequences available in the PDB (Bernstein 
et al, 1977) was carried out as indicated in the 
legend to Figure 3. Usually, the template with the 
highest resolution and the fewest mutations rela- 
tive to the consensus sequence to be modeled was 
used. For all models, multiple templates were com- 
pared, such that the effect of mutations in any of 
the templates could be evaluated directly from the 
structural alignment. The experimental structures 
displaying the highest degree of similarity to each 
of the HuCAL constructs are listed in Table 1 of 
the Supplementary Material. 

In the models (see Figure 4), the dummy CDR3 
sequences from the antibody hu4D5 (version 8) are 
shown (PDB file 1FVC). All models were checked 
with the program PROCFTECK§ (Morris et al, 1992; 
Laskowski et al, 1993) and were shown to have no 
more residues in the less favorable regions of the 
Ramachandran plot than the template structures 
(some unfavorable torsion angles in loop regions 
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are highly conserved, e.g. position- 51 at the tip of 
CDR2 in V L ), as well as having no obvious cavity 
or unusual exposed hydrophobic region, and a full 
set of standard variable domain hydrogen bonds. 

Consistent with sequence considerations, the 
great majority of canonical structures was pre- 
dicted to be present by model building, when com- 
paring the critical residues with the templates. 
More recent work (unpublished results), based on 
previous experimental observations from X-ray 
crystallography (Saul & Poljak, 1993) and muta- 
genesis (Langedijk et al, 1998), has uncovered sev- 
eral more structural relationships within each V H 
domain, which may to contribute to diversity. Par- 
ticularly, relationships between the nature of the 
residues H6, H7 and H9, due to the different 
hydrogen bonding pattern of H6 to the backbone, 
can transmit a conformational change through the 
protein via residues H18, H82, H67, and H63. Our 
analysis showed that all types of conformations 
that occur commonly in natural human frame- 
works are represented in the chosen consensus fra- 
meworks. 

In the V H 3 group of germline sequences, there is 
more variation in CDR2, because of the length 
variation of a two amino acid residue insertion 
occurring in a group of human sequences (pos- 
itions 52b and c). These antibodies might form 
more cleft-like binding pockets, and this diversity 
is not present in the original library design, even 
though many other combinations of frameworks 
would be able to form cavities and clefts as well. 
Through the modular design, however, these long- 
er CDR2 elements can easily be introduced by cas- 
sette mutagenesis. 

An analysis in analogy to that reported by Nieba 
et al (1997) showed that the exposed residues at 
the V/C interface are already of low hydrophobi- 
city in all consensus frameworks, consistent with 
their superior expression behavior in E. coli (see 
below). Moreover, many of the residues identified 
as crucial for stability and clearly selectable by 
phage display, such as P L8 (defining a conserved 
kink in the first p-strand with a cfs-peptide bond in 
Vk domains, or the fnms-proline residues at pos- 
itions 8 and/or 9 in VA. domains, see Spada et al, 
1998) are present in all master sequences. Residue 
R H66 , which is part of a conserved charge cluster, 
and frequently K in murine antibodies, where it 
leads to lower stability (see Proba et al, 1998), is 
present in all master genes except V H 5, where the 
consensus was found to be Q H66 . All residues 
known to make conserved side-chain hydrogen 
bonds are present in the master genes. Side-chain 
to side-chain: R H38 to Q H46 , D H86 and Y H90 ; R H66 to 
D H 8 6 ; R H9 4 to D H101 ; Q L6 to T L102 ; (L L37 in Vk2) 
to Y L86 ; R L61 to D L82 . Side-chain to main-chain CO: 
R H66 to H 82a ; T H87 to X H84 ; Ql6 t0 X Lse; Qlss to X^. 
Main-chain NH to side-chain: X H69 to Y H59 ; X H7S to 

D H72; x H83 10 ° H86; Xh92 3X1(1 Xmo6 to Eh6 or Q™' 

X H m to Th 87 ; X L79 to D L82 ; X L88 and X L101 to Q L6 . 
Interdomain: Q^g to Q H 39- In this listing, X refers 
to positions without dominant residue preference. 


The relative orientation of V L with respect to V H 
is still understood only poorly, and will depend on 
the exact pairwise combination and on the specific 
CDR3 sequences. Frequently, monoclonal anti- 
bodies are found with mutations within the inter- 
face. This introduces further uncertainty in 
building a model of the combining site, because a 
small deviation in angle can have a large effect at 
the top of the binding site. This variability of the 
relative orientation of the two domains is particu- 
larly large for Vk domains and Vk lacking the cis- 
Pro in position L95, and is further modulated by 
non-tyrosine residues in position L49. The "elbow" 
of ordinary Vk CDR3 inserts around L96 into a 
notch in V H and restricts the flexibility of the inter- 
face. Since the interface residues are highly con- 
served between all the consensus antibodies (see 
Figure 3), and since very similar frameworks are 
available as templates in the database, more 
reliable models may be possible for HuCAL anti- 
bodies than for antibodies further away from the 
consensus. This system of defined frameworks 
might, in addition, provide excellent access to 
studying this question of domain orientation 
experimentally. 

Construction of the seven V H and seven V L 
master genes 

The final result of the analysis described above 
was a collection of 14 amino acid sequences, which 
represent the frequently used antibody repertoire 
of the human immune system. These sequences 
were then back-translated into DNA sequences. In 
a first step, the back-translation was carried out 
using only codons that are known to be used fre- 
quently in E. coli. In a second step, these gene 
sequences were then examined for all possible 
restriction endonuclease sites, which could be 
introduced without changing the corresponding 
amino acid sequences. This was done by creating a 
database of all possible silent cleavage sites for 
each gene. Using this database, cleavage sites were 
selected that were located close to the CDR and 
framework borders and that could be introduced 
into all V H , Vk or VX genes simultaneously at the 
same position. This was considered essential to the 
overall strategy, as CDRs (or frameworks) can then 
be shuffled within pools of sequences, without 
even knowing the individual antibody sequence. 
In a few cases it was not possible to find a com- 
mon cleavage site for all genes at one of the flank- 
ing regions. In that case, one amino acid residue of 
the sequence was changed if this change seemed to 
be feasible according to the available sequence and 
structural information as delineated in the molecu- 
lar modeling section. Each sequence was then ana- 
lyzed again after exchange as described above. 

In total, six amino acid residues were exchanged 
during the design of the genes: T H3 Q (V H 2), S H42 G 
(V H 6), E L1 D and I^V (Vk3), K L24 R (Vk4) and T L22 S 
(VX3). Additionally, the first two amino acid resi- 
dues of all three VX sequences were changed to 
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aspartate-isoleucine in order to introduce an £coRV 
site common to all V L genes. After this design, 
only one element junction remained where no com- 
mon cleavage site could be found. For this region 
(the border between CDR2 and framework 3 in the 
V H sequences), two different types of cleavage sites 
were used instead: BstEII for V H 1A, V H 1B, Vh4 
and V H 5; and NspV for V H 2, V H 3, Vh4 and V H 6. 

During this analysis, several potential restriction 
endonuclease sites were identified that could be 
introduced into every gene of a given group with- 
out changing the amino acid sequence, but which 
were not located at the flanking regions of the 
CDR or framework elements. The introduction of 
these cleavage sites made the system more flexible 
for further improvements. Finally, each gene 
sequence was modified again to remove, with the 
exception of the common restriction sites, all but 
one of the other sites (with a length of the recog- 
nition site of five or more bases), since this unique 
site might be used as a "fingerprint site" to differ- 
entiate the genes by restriction digest. All these 
changes were again carried out without changing 
the corresponding amino acid sequence. The 14 
final protein sequences, including the introduced 
restriction pattern are shown in Figure 3. 

The resulting consensus protein sequences were 
finally compared to the germline sequences, and a 
mean deviation of all 49 consensus sequences from 
their closest germline counterparts of 4.9(±3.6) resi- 
dues was found. Thus, these consensus sequences 
are, on average, much more related to the germline 
sequences than the majority of rearranged 
sequences found in the database (mean deviation 
14.7 amino acid residues). In contrast to the "orig- 
inal" germline sequences, however, our synthetic 
versions have all the advantages of sequences with 


known and predictable unique restriction sites at 
the framework/ CDR borders. 

The consensus gene fragments were then 
assembled from oligonucleotides by SOE-PCR 
assembly (see Materials and Methods for details). 
Gene segments encoding the human constant 
domains C H 1 (sub type IgGl), Ck and CXI were 
designed with optimized E. coli usage and syn- 
thesized in order to create F ab fragments for dis- 
play or expression (see Materials and Methods). 
After synthesis, the gene fragments were 
assembled and inserted individually into the 
expression vector pBS12, yielding 49 single-chain 
Fv genes containing identical dummy V H and V L 
CDR3s. The general format of the scFv genes is 
shown in Figure 5. All 49 master genes were also 
cloned in the reverse oriented scFv format (V L -V H ) 
as well as in the F ab format for future libraries 
(data not shown). 

E. coli expression analysis 

The E. coli expression of the 49 scFv genes (all 
containing the same V H and V L CDR3s from the 
antibody hu4D5, see Carter et al, 1992a) was stu- 
died similarly as described by Knappik & 
Pluckthun (1995). We found that all 49 master 
genes could be expressed as soluble proteins in the 
periplasm of E. coli, yielding a band of the correct 
size in FLAG Western blots of soluble E. coli crude 
extracts (data not shown). This indicates that all 49 
combinations are most likely capable of forming 
Vh/V l pairs, since unpaired domains tend to 
aggregate (Wall & Pluckthun, 1999). 

The ratio of soluble to insoluble expressed pro- 
tein was quantified from Western blot experiments 
for each scFv gene, since this value has been 
shown to be correlated to the expression behavior 


Figure 3. Protein sequences of the HuCAL V H and V L master genes. An alignment of the seven V L and seven V H 
sequences is shown, together with the approximate location of restriction endonuclease sites that were introduced 
into the corresponding DNA sequences. The alignment, numbering and loop regions (L1-L3, H1-H3) are according to 
structural criteria defined by Chothia et al. (1992), Tomlinson et al. (1995) and Williams et al (1996). The H3 loop is 
given as defined by Chothia & Lesk (1987), although more recently, the extended H3 loop has been defined to 
include residues 92 and 104 (Morea et al, 1998). CDRs are according to Kabat et al. (1991). Color codes indicate: 
(a) the structurally least variable regions used for least-squares superposition of the C coordinates of structures and 
models (residues L3-L7, L20-L24, L33-L39, L43-L49, L62-L66, L71-L75, L84-L90 and L97-L103 for WJ H3-H7, H19- 
H23, H34 to H40, H44-H50, H67-H71, H78-H82, H88-H94 and H102-H108 for V H ) indicated as gray bars, (b) The 
average relative side-chain solvent-accessibility in the isolated domains, indicating the average side-chain solvent- 
accessibility for each position: 100% indicates a solvent-accessible surface of the same side-chain in the context of a 
poly(Ala) peptide in extended conformation. Strongly buried positions (less than 30 % of the side-chain surface is sol- 
vent-accessible) are additionally marked by B, semi-buried positions (less than 50 % of the side-chain surface is sol- 
vent accessible) are additionally marked by b. (c) The average loss of side-chain solvent-accessible surface upon 
formation of the V L /V H dimer interface, indicating residues directly contributing to the dimer interface. Positions 
strongly buried upon interface formation (more than 80% of the residual solvent-accessible surface buried in the 
interface) are additionally marked by I, and semi-buried positions (more than 40 % of the residual solvent-accessible 
surface buried in the interface) are additionally marked by i. (d) The average loss of side-chain solvent-accessible sur- 
face upon formation of the VL/CL and VH/CH interface in the Fab fragment, (e) Average loss of side-chain solvent- 
accessible surface upon binding of the antigen. Positions strongly buried upon antigen binding (more than 80 % of 
the residual solvent accessible surface buried in the interface) are additionally marked by I, and semi-buried positions 
(more than 40% of the residual solvent accessible surface buried in the interface) are additionally marked by i. 
(f) Average deviation of the C* positions of all V L or V H structures, respectively, in the PDB database (http:// 
www.rcsb.org/pdb/) from the average C a positions. 
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Figure 4. Coverage of the range of conformational variability of natural antibodies by the HuCAL frameworks. The 
homology models of the 14 HuCAL framework structures were generated using the program Insightll, modules 
Homology, Biopolymer and Discover (Biosym/MSI, San Diego, CA) as described in Materials and Methods. For 
CDR3, the sequence of antibody hu4D5 was used in all the models. The resulting V L and V H models were aligned by 
least-squares superposition of the C a coordinates of residues L3-L7, L20-L24, L33-L39, L43-L49, L62-L66, L71-L75, 
L84-L90 and L97-L103 for V L and H3-H7, H19-H23, H34 to H40, H44-H50, H67-H71, H78-H82, H88-H94 and H102- 
H108 for V H (indicated in white). For comparison, 100 non-redundant V L and V H structures (mouse and human) 
were taken from the the RCSB protein structure database (http://www.rcsb.org/pdb/) and aligned, (a) HuCAL V L 
models and (b) X-ray structures: cyan, kappa chains; blue, kappa chains lacking cis-Pro L8 (mouse only); pink, 
lambda chains, (c) HuCAL V H models and (d) X-ray structures color-coded according to the sequence pattern corre- 
lating with the framework structure subtypes: magenta, H6 = Glu, H9 = Pro; pink, H6 = Glu, H9 = Gly; cyan, 
H6 = Gin, H9 = Ala; blue, H6 = Gin, H9 = Pro. The fourth conformation not covered by the HuCAL models shows 
some correlation with the presence of Pro in position H7, which is very rare in human sequences (<1 %), but fre- 
quently seen in mouse sequences (in about 22 % of the sequences). 


of antibody fragments (Knappik & Pliickthun, 
1995; Nieba et ah, 1997; Jung & Pluckthun, 1997). 
In each separate expression experiment, the 
HuCAL H3k2 master gene was included as an 
internal control. The results are given in Table 2. 
The HuCAL genes were found to show a higher 
ratio of soluble to insoluble protein than many- 
antibody genes obtained from natural monoclonal 


antibodies and subsequently expressed in E. coli. 
The ratio of soluble to insoluble protein ranges 
from 33% (H 4 X2) to 90% {U-.AXl and H 6 k1), 
whereas a wide range of ratios has been found 
from natural antibody fragments, including many 
with ratios much below 30 % under similar exper- 
imental conditions (Forsberg et al, 1997; Nieba 
et ah, 1997). We could not find a correlation 
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phcA FLAG CDR3 ZOsatlnkw CDR3 

Fiaure 5 Arrangement of HuCAL scFv in the V H -V L orientation. The scFv gene cassette is preceded by a phoA sig- 
nafs^ueL STh^rt FLAG tag. The two domains are fused by a 20 T^^SS^iSd 
the unique restriction sites common to all master genes are shown, and the location of the CDR3 regions is indicated. 


between the type of V L gene and expression beha- 
vior of the corresponding scFv genes, but it seemed 
that the genes encoding the V H 3 or V H 1A domains 
are showing higher soluble to insoluble ratios in 
almost all combinations (Table 2). These initial 
findings clearly need to be extended by a more 
detailed biophysical characterization. 

The amounts of soluble protein produced, when 
compared to the H3k2 gene set as 100%, ranged 
from 26% to 212% (data not shown), indicating 
that soluble expression yields for all combinations 
faU into a narrow range.. Although we must expect 
that differences in the CDRs after randomization 
and selection of binders may influence the range of 
expression yields seen with the master genes, the 
use of well-expressed frameworks for creating 
libraries increases the chance to select well- 
expressed binding antibodies and reduces the large 
imbalances in the display efficiencies. 

The CDR3 sequence introduced as dummy 
sequence in all V L genes was taken from a V L 
kappa gene (see above). Since this Vk CDR3 con- 
tained a ris-proline residue at position 95, creating 
an omega-loop that is normally not found in V L 
lambda CDR3s, and which might influence the 
folding and hence the expression behavior of the 
corresponding scFv genes, a VA, dummy consensus 
CDR3 cassette encoding the sequence 89 QSYDSSLS 
was designed and used to replace the Vk dummy 
CDR3 in the H3M scFv gene. Interestingly, how- 
ever, no significant difference in expression yields 
could be detected (data not shown). 

The expression behavior of two randomly cho- 
sen scFv genes (H2k2 and H3k2) was analyzed in 
more detail. These two genes were selected from 


panning experiments after library creation (see 
below) and therefore contained CDR3 sequences 
different from the dummy sequence of the master 
genes. Since both scFv fragments bound the anti- 
gen they were selected on, we could use ELISA 
experiments to determine the amount of active 
material in the lysates after different times of 
induction. The results are shown in Figure 6. We 
found that the expression titer after five hours of 
induction at 30 °C was 6 mg (H2k2) and 10 mg 
(H3k2) per liter of shaking-flask culture. The 
expression titer stayed constant for several hours 
and then decreased, probably due to the start of 
cell leakiness. This observed expression yield is sig- 
nificantly higher than that reported for antibody 
fragments from other libraries (Griffiths et al, 1994; 
Vaughan et al, 1996). 

Design and construction of CDR3 
library cassettes 

Our rational approach to creating an antibody 
library aims at defining, with the smallest number 
of molecules possible, a structural diversity as 
large as possible. At the same time, it was import- 
ant to design molecules that are likely to be stable 
and fold well. Furthermore, it was essential to 
direct the sequence diversity to those residues 
most likely in contact with the antigen. We decided 
for the first set of HuCAL libraries to randomize 
both CDR3 regions of the V H and V L genes simul- 
taneously, since these two regions form the inner 
circle of the antigen binding site, and therefore 
show the highest frequency of antigen contacts in 
structurally known antibody-antigen complexes. In 
order to obtain the highest degree of diversity in 


Table 2. Expression analysis of HuCAL master genes 
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Figure 6. Growth curves and expression kinetics tor 
two HuCAL scFv fragments, (a) Gene derived from the 
H3k2 HuCAL framework; (b) gene derived from the 
H2k2 HuCAL framework. The growth curves (circles) 
were determined by measuring the absorbance at 
600 nm at the indicated time-points. For comparison, 
the growth curve of the uninduced culture (open circles) 
is given in (a). The arrows indicate the time-point of 
induction. The amount of functional scFv (squares) at 
the different time-points was determined by ELISA 
measurements of crude extracts. The corresponding 
purified antibody fragments of known concentration, 
also measured in the presence of a corresponding 
amount of cell extract, served as internal standard to 
calculate the scFv amount based on the ELISA signal 
obtained. The mean and standard deviation of three 
different measurements is given for each experiment. 


the V H CDR3, which is also the most variable 
region in natural antibodies, we applied the follow- 
ing strategy for library generation: first, we 
designed V L CDR3 library cassettes strongly biased 
for the known natural distribution of amino acids 
(see below) with relatively low complexity and 
inserted those in the V L master genes, aiming at a 
library size of about 10 7 members. Subsequently, 
we used these V L libraries to insert a V H CDR3 
library cassette with very high complexity (both in 
terms of sequence composition and length vari- 


ation), ensuring that every single library member 
contains a unique V H CDR3 sequence. 

Since we used trinucleotides (Virnekas et ah, 
1994) for the generation of the CDR3 library cas- 
settes (see below), we could introduce any amino 
acid bias at any position of the cassettes. We 
decided to first analyze the sequence variability in 
the CDR3 regions of our databases of human 
rearranged antibody sequences and use this infor- 
mation together with structural data for the library 
design, in order to bias the CDR3 sequences 
towards the naturally found human antibodies. 

Vk CDR3 

A total of 382 sequences of rearranged antibodies 
from our initial internal database were analyzed. 
In the following discussion, we will use the num- 
bering system and definitions of CDRs regions 
introduced by Kabat even though this does not 
always correspond to the structural definitions 
(Chothia & Lesk, 1987; Barre et ah, 1994; Giudicelli 
a ah, 1997). 

A fraction of 72.3% of all CDR3s had a CDR 
length of eight amino acid residues, the remaining 
sequences had CDR lengths of less than seven 
(1.8%), seven (7.3%), nine (17.3%), and ten (1.3%) 
residues. Because of the predominance of CDRs of 
eight residues, we decided to consider just that 
size for constructing a CDR3 library. The omega- 
loop structure of Vk CDR3 is determined by a 
characteristic cis-proline residue at position 95, 
which is encoded in 96% of all k germline genes, 
but can be lost upon V-J rearrangement. A total of 
six canonical structures have been discussed with 
structural data being available for structures 1 and 
2 (Al-Lazikani et ah, 1997). In canonical structure 1, 
residues 90 and 95 are predominantly occupied by 
glutarrdne and proline, respectively, whereas in 
structure 2, the presence of cis-proline at position 

94 is characteristic. About 87% of all 382 sequences 
had Q L90 , and 78 % had P L95 , whereas P L94 was pre- 
sent in only 1% of all sequences. Therefore, we 
decided to base the design of Vk CDR3 on struc- 
ture 1. Besides the canonical residues, position 89 
showed a strong conservation, with glutamine pre- 
sent in 89 % of all sequences. Residues 89 and 90 
are not part of the region outside the (3-strand 
forming the CDR-L3 loop, which comprises resi- 
dues 91 to 96 (Chothia & Lesk, 1987). Within CDR- 
L3, a high degree of variability (except for position 

95 mentioned before) can be seen, with some pre- 
ference for tyrosine at position 91. This corre- 
sponds well with the inspection of antigen contact 
residues in structurally known antibody-antigen 
complexes, showing that positions 91 to 94 and 96 
seem to play the most important role (see Figure 3). 

In our design of the library (see Figure 7(b)), we 
kept position Q L90 constant. Besides being a cano- 
nical residue, the side-chain of this glutamine resi- 
due does not contribute to the antigen-binding 
pocket, but points in the opposite direction. In the 
trinucleotide mixture, we biased positions 89 and 
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a: VH b: VL kappa c: VL lambda 



Figure 7. Comparison between design and experimental composition of CDR3 libraries used. For each position of 
the CDR3 region (numbering according to Kabat et ah, 1991; for HCDR3 the position before H101 is numbered lOOz, 
the length variable region is numbered from H95 to HlOOs), the amino acid composition in the planned libraries 
(P, left columns) is compared with the composition found from sequencing 257 clones of the initial libraries (F, right 
columns). The TRIM mixture indicates the mixtures of trinucleotides used in the oligonucleotide synthesis (see Table 3 
of the Supplementary Material). Occupied indicates the number of amino acids encoded by the respective mixture 
and found in the sequenced clones, respectively. 


95 strongly towards glutamine and proline, 
respectively. A limited set of trinucleotide codons 
was allowed for positions 92 and 93, despite the 
fact that a large number of different residues can 
been found there, because the side-chains of these 
residues point away from the V H CDR3 contact 
side. In contrast, for position 91, 18 amino acids 
(all except cysteine and proline) were allowed 
(biased towards Y L91 ). Since proline is never found 
at position 91 in germline or rearranged sequences, 
it could be that P L91 would not allow the loop to 
form the correct conformation. Cysteine was 
omitted, since it was almost never found and it 
might cause problems during phage panning and 
later expression because of disulfide formation. 
Accordingly, for positions 94 and 96, all amino 
acids except cysteine were allowed. The residues 
located at those three most strongly randomized 
positions point into the binding pocket. By focus- 
ing the diversity towards positions that are most 
likely in contact with the antigen, we could reduce 
the overall theoretical diversity to a value of 
1.3 x 10 6 , which ensured that the theoretical diver- 
sity will be present in the final library. 

For the four different HuCAL Vk consensus 
genes, three trmucleotide-containing oligonucleo- 
tides were synthesized. A single oligonucleotide 
for VkI and Vk3 was possible, since both differ 
only at position 85 (k1 T L85 ; k3 V L85 ) and could 
thus be synthesized by using a mixture of two tri- 
nucleotides encoding threonine and valine in a 1:1 
ratio at the appropriate position. Structural inspec- 
tion revealed that residue 85 has no contact to 
other residues, thus making it likely that an 
exchange of these two similar amino acids would 
not cause problems. Indeed, we found the expected 
1:1 ratio at this position after library construction 
and sequencing of clones (data not shown). 


For oligonucleotide synthesis, six different trinu- 
cleotide mixtures (Tk2 to Tk6, see Figure 7(b)) had 
to be prepared comprising two to 19 codons, either 
biased or equally distributed. While initial results 
had suggested that different trinucleotides couple 
with different relative coupling yields (Virnekas 
et at, 1994), more controlled subsequent exper- 
imentation showed that these differences were not 
systematic (data not shown) and thus, trinucleotide 
mixtures were prepared directly using the desired 
molar ratios, thereby implicitely assuming an equal 
coupling yield. During oligonucleotide synthesis, 
the stepwise coupling ratio for trinucleotide mix- 
tures ranged from 95.5% to 97.5%, the overall 
yield per oligonucleotide from 44 % to 68 %. 

After cassette preparation, restriction digest and 
purification, the cassettes were ligated into the four 
Vk consensus genes using the unique restriction 
sites Bbsl and MscI, and the ligation mixtures were 
electroporated into E. coli TGI cells. We obtained 
6 x If/ independent colonies, and hence an almost 
complete coverage of the theoretical diversity. The 
quality of the cassettes was then checked by 
sequencing 235 independent clones. A total of 175 
clones (75 %) were completely correct and showed 
the library composition as planned. Four clones 
contained an unplanned amino acid at one pos- 
ition, which was most likely due to single-base 
mutations introduced during cassette preparation, 
three clones contained a one-base and six clones 
contained a one-codon deletion in the trinucleo- 
tide-encoded region. All other non-correct clones 
had the library cassette inserted twice or in the 
reverse orientation, or they contained one-base del- 
etions in the 5' mononucleotide region of the oligo- 
nucleotide. In order to obtain more statistical data 
on codon incorporation, all codons originating 
from trinucleotide positions were analyzed. 
Figure 7(b) shows the result of that analysis. Over- 
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all, the data are in excellent agreement with the 
expected distribution. Q L89/ Y L91 , and P L95 
appeared almost exactly as planned at these 
strongly biased positions. 


VX CDR3 

A total of 147 rearranged human VX sequences 
were collected and analyzed. The lengths of the 
CDR3s (positions 89 to 96) ranged from seven to 
12 residues, the majority (92%) having between 
eight and ten residues (L3 loop lengths six to eight 
according to Chothia & Lesk, 1987). Therefore, we 
decided to construct a CDR library comprising 
these three different length variants. Analysis of 
the amino acid composition in the rearranged 
sequences revealed a high degree of variability at 
positions 93 to 96, and to a smaller extent at pos- 
itions 89 to 92. The inspection of antigen contact 
residues in the case of an antibody of canonical 
structure 1 (Chothia & Lesk, 1987, see Figure 3) 
revealed that positions 91, 94, and 96 seem to play 
the most important role. A single VX. CDR3 oligo- 
nucleotide for all three VX, consensus genes was 
designed, where Q L89 and S L90 were kept constant, 
since neither position is part of the loop region. 
Similarly, D L92 was fixed as the most frequent 
amino acid at that position and because its side- 
chain points away from the binding pocket. Resi- 
due 91, which packs against V„ CDR3, was limited 
to the three most frequent amino acids found in 
the database (arginine, tryptophan and tyrosine). 
At positions 93 to 95B, an equimolar mixture of all 
amino acids except for cysteine and tryptophan 
was allowed, since cysteine and tryptophan were 
never found in the rearranged sequences. Position 
96 was completely randomized, except for 
cysteine. 

Since the framework 3 region adjacent to the 
CDR3 of all three VX master genes is identical, we 
could use a single oligonucleotide for all three 
genes. For oligonucleotide synthesis, three different 
trinucleotide mixtures had to be prepared compris- 
ing three biased codons, 18 or 19 codons (in both 
cases equally distributed). The three mixtures and 
their positions in the CDR3 are given in Figure 7(c). 
On average, the stepwise coupling ratio for trinu- 
cleotide mixtures was about 98.9%, the overall 
yield for the oligonucleotide was 80%. During 
oligonucleotide synthesis, we used four consecu- 
tive sub-stoichiometric coupling steps at the triplet 
position corresponding to residue 95A. Thereby, 
we created an oligonucleotide with variable length 
covering CDR3 lengths between eight and 11 
amino acid residues, with the smallest fraction 
having a CDR3 length of 11 residues. The theoreti- 
cal diversity of these length variants ranged from 
3.3 x 10 5 (eight residues) to 1.9 x 10 9 (11 residues). 

After cassette preparation, restriction digest and 
purification, the cassette was ligated into the three 
VX. consensus genes using the unique restriction 
sites Bbsl and Hpal, and the ligation mixtures were 


electroporated into E. coli TGI cells. We obtained 
5.7 x 10 6 independent colonies. 

As described above for Vk, the quality of the oli- 
gonucleotide was checked by sequencing (183 
independent clones). Again, about 26 % of incorrect 
sequences could be identified, with errors of the 
type similar to those found in the Vk CDR3s. A 
total of 74 % of all clones, however, had completely 
correct CDR cassettes. The amino acid composition 
was again in very good agreement with the desired 
distribution, except for Y L91 , which was over-rep- 
resented at the expense of W L91 (see Figure 7(c)). 
The length distribution was also analyzed: we 
found that the majority contained a CDR3 length 
of eight (36 %) or nine (42 %) residues, the rest had 
a length of ten (21 %) or 11 (2 %) residues. 

V H CDR3 

For the highly variable V H CDR3s, all available 
rearranged sequences were grouped together, irre- 
spective of the individual sub-families. A total of 
572 sequences were analyzed. The analysis 
revealed that only position H101 is stongly biased 
(toward aspartate in 82% of all cases). This is in 
agreement with the findings that R H94 and D H101 
form a highly conserved salt-bridge (Searle et al, 
1995), and that these two residues are critical for 
the "kinked base" (Shirai et al, 1996) or "bulged 
torso" (Morea et al, 1998) structure of the CDR3 
loop. D H101 was therefore kept constant, although 
this limits the structural variability to only a subset 
of CDR-H3 conformations, as other structures are 
seen in antibodies devoid of the R H 94-Dhioi salt- 
bridge. 

Again, the observed variability corresponds well 
with the information obtained by inspection of 
antigen contact residues, showing that positions 
H95 to HlOOy seem to play the most important 
role, whereas HlOOz is involved to a lesser extent 
(see the legend to Figure 7 for HCDR3 position 
nomenclature). Position HI 02 was found not to be 
important for antibody/antigen interactions (see 
Figure 3). 

When designing the library cassette, we decided 
to base the , composition of the trinucleotide mix- 
tures for all positions except for HlOOz and H102 
on the overall amino acid composition of the natu- 
ral heavy chain CDR3s. Positions HlOOz and H102 
were analyzed separately. This resulted in three 
different codon mixtures, named TH1 (for H95 to 
HlOOy), TH2 (for HlOOz), and TH3 (for H102). The 
compositions of these mixtures are given in 
Figure 7(a). 

Analysis of the length variability of CDR3 (pos- 
itions 95 to 102) showed a range between four and 
28 residues with a maximum at 13.0. Wu et al. 
(1993) found a mean length of 11.6 residues in 
their analysis of human antibody sequences. To be 
able to cover such a broad spectrum of length var- 
iants, two separate oligonucleotides were syn- 
thesized using the sub-stoichiometric coupling 
approach to create the shorter library CDR3Ha, 
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comprising five to 22 residues and the longer 
library CDR3Hb, comprising nine to 28 residues. 
Since the two length variants were kept separated 
during library construction (see below), their use 
might be adapted to the antigen in question. More- 
over, by mixing these two libraries appropriately, 
it is possible to mimic the natural length diversity. 
The final yields for oligonucleotides CDR3Ha and 
CDR3Hb were 68% and 74%, respectively, and the 
sub-stoichiometric coupling rates varied between 
35% and 55%. Based on these coupling rates, a 
theoretical length distribution for the two libraries 
CDR3Ha and CDR3Hb was calculated (see 
Figure 8). 

After cassette preparation, restriction digest and 
purification, the cassettes were inserted into the 
scFv libraries already containing the randomized 
V L CDR3s described above. We mixed all four Vk 
and all three VX libraries before HCDR3 insertion, 
but we kept the V H consensus genes separate 
(except V H 1A and V H 1B, which were also mixed). 
Hence, 24 separate libraries were created (V H 1 to 
V H 6, each either with four k or three X genes, and 


each either with the short or the long HCDR3 
cassette). After electroporation into E. coli TGI 
cells, we obtained altogether 2.1 x 10 9 independent 
colonies. 

The quality of the V H CDR3 s were checked by 
sequencing 257 clones. In Figure 7(a) the amino 
acid distributions for the trinucleotide mixtures 
TH1, TH2, and TH3 are given, showing again an 
excellent agreement with the calculated and 
designed frequencies. The sequencing results 
obtained from both V H CDR3 length variants 
revealed that both library types follow a Gaussian 
length distribution, with the maxima at 9.0 and 
16.6 residues (Figure 8). Thus, the actual length 
distribution was shifted towards shorter lengths 
when compared to the theoretical length distri- 
bution, but the whole range of naturally occurring 
length variants was covered by the two library 
variants. 

The final library was designated HuCALl. 
Altogether, we found the fraction of fully correct 
library members with CDRH3 and L3 as designed 
to be 61 %. 
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HCDR3 Length 
(Position 95 to 102) 

Figure 8. Distribution of CDR3 H length variants in 
the HuCALl libraries. The results from (a) the trinucleo- 
tide cassette HCDR3a, and from (b) the cassette HCR3b 
are shown (black columns) and compared to the length 
distribution as calculated from the substoichiometric 
coupling (gray columns). For details, see the text. 


Diversity and binding constants 

Phage-display as well as ribosome-display selec- 
tion experiments were performed against a variety 
of antigens, including proteins, peptides, or whole 
cells. The HuCALl library comprosing all 49 com- 
binations was used for selection experiments. Two 
or three panning rounds of phage display, or five 
or six rounds of ribosome display were performed 
in each case. After the final round, the selected 
scFv genes were subcloned as a pool in an 
expression vector and the transformants were 
screened for binding using ELISA or FACS assays. 
Details about the selection experiments and the 
characterization of binders will be given elsewhere 
(Krebs et al, unpublished results; Hanes et al, 
unpublished results). In the great majority of cases, 
many different scFv fragments could be identified, 
which bound the antigen specifically. The V H and 
V L framework usage for the first 250 specific bin- 
ders selected from HuCALl via phage display is 
given in Table 3. All V H and V L frameworks could 
be selected, and so far 42 of the 49 framework com- 
binations were found to be used. While the Vh4 
gene segment is rarely used, the V H 3 gene segment 
predominates. The predominance of V H 3 occurs 
also in nature (see Table 1) and is even higher in 
other libraries (Griffiths et al, 1994; Vaughan et al, 
1996). All other HuCAL frameworks seem to be 
used with similar frequency. There is also a con- 
siderable variation in V H CDR3 length: the first 250 
specific binders range from four to 24 residues 
(data not shown). 

Selected binders were purified to homogeneity 
using affinity chromatography or IMAC, and their 
monovalent binding constants were measured 
using surface plasmon resonance (BIAcore). As 
shown in Table 4, binding constants of peptide bin- 
ders were in the micromolar range, whereas affi- 
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Table 3. Framework usage 


k1 k2 k3 k4 XI XI XA S % 


H1A 4133137 22 9 

H1B 3 42161 17 7 

H2 2 5 11 7 11 4 15 55 22 

H3 12 5 16 11 15 7 25 91 36 

H4 2 13 1 

H5 4 1 5 3 1 5 14 33 13 

H6 8 3 5 2 3 8 29 12 


S 33 12 42 31 33 28 71 250 

% 13 5 17 12 13 11 28 

For each of the 49 HuCAL framework combinations, the number of specific scFvs from a collection of 250 binders against about 
50 different antigens (haptens, peptides, proteins) is shown. All clones have been selected by phage display. The identity of the fra- 
mework was determined by sequencing. 


nities to protein antigens were usually in the low 
nanomolar range. 

Discussion 

Here, we describe the realization of the concept 
of fully synthetic human antibody libraries, desig- 
nated HuCAL, which are built on seven V H and 
seven V L consensus frameworks, yielding 49 com- 
binations in total. 

We have extensively used these first libraries for 
the successful selection of highly specific binders 
against all kinds of antigens, including haptens, 
DNA, peptides, and proteins, including cell-bound 
receptor antigens (unpublished results). Intrinsic 
affinities down to the sub-nanomolar range were 
found against protein antigens, and the majority of 
binders were found to have dissociation constants 


between 1 and 1000 nM after only two rounds of 
selection. All frameworks have been selected, the 
selected antibodies could be shown to be expressed 
in good yields, they are suprisingly stable against 
thermal denaturation, and can be used in typical 
applications like ELISA, immunoblotting, FACS 
analysis, immunoprecipitation or immunohisto- 
chemistry even without any affinity maturation 
steps, verifying the successful design of completely 
synthetic human antibodies described in this 
study. 

Strategy of modular design 

The 49 consensus genes were derived by a step- 
wise analysis of human antibody sequences. First, 
the collected sequences were grouped into families 
according to sequence homology. Second, the 


Table 4. Affinities of HuCAL scFvs 





Affinity 

*on >< 

fcoffXlO- 2 

App. size 

Antigen 


Framework 

(nM; BIAcore) 

(M- 1 S" 1 ) 

(s- 1 ) 

(kDa; SEC) 

ICAM-l 3 

ICAM1-1 

H3X3 

9.4 

2.13 

0.20 

32 

ICAM-l a 

ICAM1-15 

H5X2 

72.7 

1.72 

1.25 

27 

Insulm i 

C59 

HIAkI 

0.082* 



32 

Insulin b 

A21 

H3k2 




32 

CDllb c 

Macl-5 

H2U 

1.0 

7.92 

0.09 

36 

CDllb 0 

Macl-29 

H2X3 

1.2 

1.76 

0.03 

25 

EGFR (human) 

A9-1 

H2k2 

246 

1.34 

3.30 

25 

Macl peptid d 

3B2 

H3X2 

1130 

1.85 

21.0 

30 

Hag peptide" 1 

C22-2 

H3k4 

610 

1.41 

8.6 

28 

NFkB peptide d 

27HA1 

H3X3 

1600 

0.55 

8.8 

32 

Affinity of FPLC-purified antibody n 

lonomers measured by SPR on a 

BIAcore biosensor. Antigens were coupled 

to CM5 sensor 


chips. In order to avoid contamination with multimeric variants, the monomeric scFv fragments were isolated by size-exclusion 
chromatography (SEC) (Krebs et ah, unpublished results; Hanes et «/., unpublished results). 
a The extracellular part of human ICAM-1 (residues 26 to 479) was used as antigen. 

b Selection against bovine insulin was carried out with ribosome display (Hanes & Pluckthun, 1997; Hanes et ah, 1998; unpub- 
lished results). These antibodies carry additional point mutations created during PCR amplification. 
c The I-domai of human CDllb (residues 149 to 353) was used as antigen. 

d The following peptides were synthesized, coupled to a protein carrier and used for antibody selection. 
Macl peptide: NH 2 -C-DAFRSEKSRQELNTIASKPPRDHVF-COOH 
Hag peptide: NH 2 -C-AGPYDVPDYASLRSHH-COOH 
NFkB peptide: NH 2 -C-LHVTKKKV-COOH 

e Affinities determined with the inhibition BIAcore method, in which a mass transport-limited on-rate is measured as a function of 
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usage for each germline gene was analyzed by cal- 
culating for each rearranged sequence in the data- 
base the germline gene from which it was derived. 
Third, the families of frequently used antibody 
genes were analyzed in terms of structural diver- 
sity of the antigen binding loops, following the 
concept of canonical CDR conformations estab- 
lished by Chothia and co-workers (Chothia et al, 
1989). Fourth, consensus sequences were derived 
from the rearranged sequences, and grouped into 
families of frequently used human antibodies. 
Altogether, the analysis resulted in seven V H , four 
Vk and three VX consensus sequences, and our 
analysis suggests that this small set of consensus 
genes covers almost the entire structural repertoire 
encoded in those human antibody germline genes 
that were found to be used during the immune 
response. 

Reducing the human antibody repertoire to 49 
distinct Fv frameworks, yet without reducing 
structural diversity, made it feasible to obtain the 
sequences de novo by gene synthesis, which 
enabled us to incorporate several features into the 
genes that facilitate library construction, affinity 
maturation and E. coli gene expression. Moreover, 
the separate construction of the genes and the 
resulting libraries allowed detailed analysis of each 
master framework under defined conditions, 
which is not possible with antibody phage-display 
libraries derived from natural sequences by PCR 
cloning. Particularly, the presence of unique restric- 
tion sites across the whole library makes it possible 
to shuffle CDRs and frameworks, even at the level 
of pools, and without knowledge of the sequence 
of the antibodies. Furthermore, the approach is 
modular and can incorporate future knowledge of 
antibody structure, folding and stability, as indi- 
vidual framework pieces can easily be replaced in 
future versions. 

The availability of separate libraries for each of 
the combinations allows one to analyze the per- 
formance of separate framework combinations and 
a direct comparison with results obtained from a 
mixture or the natural immune response. It also 
provides a way to force the selection against differ- 
ent epitopes on the same protein, which can be a 
very crucial feature given that in vivo applications 
may require the blocking of a binding site on a 
receptor by the antibody, while a different epitope 
on the receptor may be completely immuno-domi- 
nant. In this case, the preferentially selected but 
unwanted framework combination can simply be 
left out. Alternatively, separate affinity enrich- 
ments with subsets of frameworks can be carried 
out to enforce the binding of diverse epitopes. In 
addition, further analysis of the performance of 
this and other libraries may show that particular 
framework combinations contribute little to the 
pool of selected binders, while others need to be 
provided with more initial diversity in CDR1 and 
CDR2. The number of frameworks is, of course, 
arbitrary and can be adjusted by addition of new 
and subtraction of unnecessary ones. 


Expression and folding properties 

The HuCAL genes were adapted to E. coli codon 
usage. While we indeed found superior expression 
behavior from most of the synthetic genes, this 
probably reflects favorable protein folding proper- 
ties (see below), although the avoidance of codons 
used only very rarely is at least a prerequisite for 
high expression yields. The consensus frameworks 
described here may be an interesting basis for elu- 
cidating the framework contributions to differences 
in folding yield during recombinant antibody 
expression and to thermodynamic stability. The 
absence of large differences in expression behavior 
between the consensus frameworks may improve 
library quality, since the probability of clones 
being eliminated during library selections due to 
very different effects of distinct antibody sequences 
on the bacterial cell physiology is minimized. 

In this context, it is interesting to note that the 
high-expressing humanized antibody hu4D5, 
which was shown to be expressed 10-50-fold better 
in E. coli than the murine parental antibody (Carter 
et al, 1992a), was designed using human consensus 
frameworks derived from the subfamilies V H 3 and 
VkI (Carter et al, 1992b). The human V H 3 germline 
gene 3-23 (DP-47), which is most homologous 
(99% identity) to the HuCAL consensus amino 
acid sequence of the V H 3 germline subfamily, is 
also the most frequently used V H 3 germline gene 
(see Table 1) and it is very frequently found in 
antibody phage-display libraries based on human 
genes (Griffiths et al, 1994; Vaughan et al, 1996; 
Dorsam et al, 1997; Boel et al, 1998; Sheets et al, 
1998). Our theoretical analysis (unpublished 
results) showed that this framework has very few 
of the recognized sequence problems. Such pro- 
blem spots include exposed hydrophobic residues 
that might promote misfolding and aggregation 
(Nieba et al, 1997), non-Gly residues in positions 
with conserved positive phi angles, proline in pos- 
ition H40 (Knappik & Pluckthun, 1995) and the 
disruption of the highly conserved charge cluster 
around R H6 6/ D H86 and R L6 i/D L82 (Proba et al, 
1998). 

It is reasonable therefore, to hypothesize that 
consensus sequences, which are closely related to 
phylogenetically old progenitor genes, are better 
adapted to folding in an environment like the 
E. coli periplasm, where probably most of the fold- 
ing catalysts and chaperones, which normally act 
on the folding pathway in the ER lumen of the 
antibody producing B-cell, are absent. It is tempt- 
ing to speculate that a consensus sequence defines 
a point in sequence space from which the observed 
sequences have diverged through genetic drift 
until the function of the protein is no longer main- 
tained. This speculation is supported by exper- 
iments (Steipe et al, 1994), where a clear 
correlation between degree of deviation from the 
consensus sequence and loss of thermodynamic 
stability of a murine V L domain was found. Recent 
studies (Worn & Pluckthun, 1999) have shown 
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that, taking all available information into account, 
very stable and well-expressing antibodies can be 
engineered, suggesting that the (3-sandwich frame- 
work is, in principle, a highly stable scaffold. Yet, 
most antibodies have diverged far from this point, 
first in the course of gene duplication during evol- 
ution of the locus, then in the V(D)J rearrangement 
where unfavorable CDR3s may be introduced, and 
finally in the somatic mutations, yielding antibody 
domains of very marginal biophysical integrity. 
The mouse repertoire is thought to be significantly 
larger than the human repertoire (Almagro et al, 
1998) and thus more deviations from the optimum 
are genetically encoded, partially explaining the 
difficulties in expressing antibody fragments 
derived from murine hybridomas. Several residues 
experimentally shown to be non-optimal (Spada 
et al, 1998; Proba et al, 1998) have been found to 
be encoded in some of the mouse germline genes, 
but in none of the human genes. Such residues are 
totally avoided in the present design. 

Affinity maturation 

The synthetic HuCAL genes were designed to 
contain unique restriction sites flanking the regions 
encoding the antigen binding loops, thereby mak- 
ing all six CDR regions accessible for diversifica- 
tion. The resulting modular gene structure, in 
combination with pre-built CDR library cassettes, 
will allow the rapid randomization of each CDR 
loop. We have constructed trinucleotide-based 
LCDR1 and HCDR2 cassettes using a design pro- 
cedure identical with that described here for CDR3 
cassettes (unpublished results). Hence an iterative 
randomization procedure can be envisaged, where 
the pool of binding sequences obtained after initial 
library selections can serve as starting material for 
the next iteration. Such a protocol would mimic 
the process of affinity maturation by somatic 
hypermutation observed during the natural 
immune response, even though the mechanism for 
achieving this would be different. It may be 
reasoned that this will be more efficient, as more of 
the mutations will be targeted to the region of 
interest. So far, the CDR walking process has been 
time-consuming, since the protocols and the CDR 
libraries had to be established for each individual 
antibody sequence. By using cassettes and the con- 
served restriction sites of the synthetic genes, how- 
ever, an optimization of pools is possible, and the 
procedure is much more convenient. It has been 
shown now by several groups that the process of 
CDR walking, i.e. the iterative randomization of 
CDRs followed by stringent selection protocols, 
improved binding affinity of distinct antibody 
sequences dramatically (Yang et al, 1995; Schier 
et al, 1996b; Barbas & Burton, 1996; Rosok et al, 
1998; Wu et al, 1998), and intrinsic affinities in the 
picomolar range could be obtained by this 
approach. 

Nevertheless, framework residues can have 
indirect effects on binding by affecting the CDR 


conformations (Foote & Winter, 1992; Saul & 
Poljak, 1993), and a complete refinement may have 
to include these regions as well, e.g. by gene shuf- 
fling (Patten et al., 1997) or ribosome display 
(Hanes et al, 1998). Recently, the latter approach 
has been applied to the HuCALl library, and 
binders with sub-nanomolar affinities to several 
antigens have been obtained that do carry further 
mutations introduced by PCR (unpublished 
results). 

Trinucleotide mixtures for CDR libraries 

Using the 49 combined HuCAL frameworks, the 
initial libraries were created by randomizing two 
of the six CDR regions using trinucleotide building 
blocks. Sondek & Shortle (1992) first reported the 
use of a mixture of two trinucleotide phosphorami- 
dites, but found a coupling yield of only 4% and 
large differences of relative coupling ratios. 
Virnekas et al (1994) showed that coupling of tri- 
nucleotide mixtures can be achieved with coupling 
yields as high as 96-98.5 %, by carefully excluding 
traces of water during preparation of the phos- 
phoramidite mixtures for coupling. However, in a 
first experiment using eight different trinucleotides, 
the individual codons were introduced with differ- 
ent frequencies (between one and 15 times within 
63 positions being sequenced). No further improve- 
ment has been reported by other groups using 
similar building blocks (Lyttle et al, 1995; Ono 
et al, 1995; Kayushin et al, 1996). Braunagel & 
Little (1997) used the trinucleotides described by 
Kayushin et al (1996) in their approach to create a 
single-framework antibody library. However, no 
sequencing results were given to show the quality 
of the starting library or the distribution of individ- 
ual codons. 

We found that mixtures of trinucleotide phos- 
phoramidites can be coupled in excellent yields. 
Oligonucleotides with a length of more than 100 
bases and containing ten to 15 randomized pos- 
itions have been successfully synthesized. Further- 
more, no bias was found in most cases and 
trinucleotide-directed mutagenesis appears now to 
be the method of choice to achieve full control over 
the variability. 

The option of using sub-stoichiometric coup- 
ling steps during oligonucleotide synthesis opens 
up a novel way of creating diversity by 
sequence and by length variation in a single oli- 
gonucleotide. We used sub-stoichiometric coup- 
ling for the generation of and V H CDR3 
libraries, and indeed it was possible to create 
CDR3s of different length with this method. 
However, the distribution of different length var- 
iants was in all cases shifted to shorter library 
members than calculated, suggesting that the 
stepwise coupling yields calculated from measur- 
ing the concentration of trityl cations, cleaved off 
the 5'-end, is higher than the actual coupling 
yield, i.e. the percentage of oligonucleotide 
chains being elongated during the sub-stoichio- 
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metric step. However, the parameters influencing 
the outcome of the sub-stoichiometric approach 
have not been studied in detail. 

We decided initially to start with a diversifica- 
tion of CDR-L3 and CDR-H3, to imitate natural 
antibody generation. During the natural process of 
initial antibody generation, which results from gen- 
ome rearrangements in the developing B-cell, most 
of the initial diversification is located in the V H 
CDR3 region (VDJ-joining) and, to a lesser extent, 
the V L CDR3 (VJ-joining). In the 3D structures of 
antibodies, both CDR3s form the so-called inner 
ring of the antigen binding site, and most of the 
antigen contacts are formed by residues located 
there (see Figures 1 and 3). 

Comparison to semi-synthetic 
antibody libraries 

The use of defined frameworks as the basis for 
generating an antibody library has been described 
before. Initial work on randomizing just CDR-H3 
(Barbas et al, 1992) has since then been extended 
to Vk CDR3 (Barbas et al, 1993; Yang et al, 1995; 
Soderlind et «/.,1995) or to single frameworks with 
all CDRs being randomized (Hayashi et al, 1994; 
Iba & Kurosawa, 1997). Furthermore, sets of V H 
genes extended with PCR primers that encode 
CDR-H3 libraries have been combined with a 
single V : gene (Nissim el al, 1994), or a limited set 
of V L genes (De Kruif et al, 1995), or a randomized 
repertoire of V L genes (Griffiths et al, 1994). 

Most of the semi-synthetic human antibody 
libraries constructed so far focussed on exclusively 
randomizing CDR3s. For V H , in most approaches 
-A-H93/ Rh94 and D H101 were kept constant, and pos- 
itions H95 to HlOOz, and usually H102 as well, 
were randomized (Hoogenboom & Winter, 1992; 
Barbas et al, 1993, 1994; De Kruif et al, 1995). The 
length of the CDR3s varied between six and 20 
residues, with a preference for loops with six to 14 
amino acid residues. De Kruif et al. (1995) con- 
structed a set of eight CDRs between eight and 17 
residues long, comprising completely and semi- 
randomized stretches. 

For Vk CDR3, usually residues L92 to L96 
were randomized (Barbas et al, 1993, 1994; Yang 
et al, 1995; Soderlind et al, 1995). The length of 
the CDR3s varied between seven and ten resi- 
dues. Similarly, Hayashi et al. (1994) randomized 
11 residues (including residue L97 of framework 
4) of VA. CDR3 in their approach to construct a 
one-framework library with all six CDRs being 
randomized. In contrast, Griffiths et al (1994) 
used a whole set of 21 VI (as well as Vk) germ- 
line genes and added, via PCR, specific CDR 
sequences comprising zero to five randomized 
codons. In all cases, codons were randomized by 
using mixtures of mononucleotides during oligo- 
nucleotide synthesis. 

In our CDR3 design, we had to decide whether 
to stay close to the encoded variety with a prefer- 
ence for sequences actually found in selected anti- 


bodies or whether to follow a more daring 
approach. While, technically, both approaches are 
equally feasible, as it would depend only on the 
types of cassettes used, we opted to first examine 
CDR3 libraries close to the encoded variety. Even 
in a loop of this size, many combinations will be 
non-functional, and we wanted to secure a very 
high number of initial functional molecules. As 
library selection technology progresses, e.g. by the 
use of methods such as ribosome display (Hanes & 
Pluckthun, 1997; Hanes et al, 1998), much larger 
libraries will be screenable, and a larger set of 
variants may be simultaneously present, including 
those with structural defects. 

When using the known rearranged sequences as 
a guide, it becomes an important question to what 
degree they represent "frozen accidents", explain- 
able only by their evolutionary ancestry both at the 
germline and somatic level, or whether they are 
truly positively selected or are even due to genetic 
hotspots, encoded into the DNA sequence. The 
processes underlying somatic hypermutation are 
still not well understood. It was shown that heter- 
ologous genes replacing V gene segments undergo 
hypermutation in vivo as well (Yelamos et al, 
1995), and therefore it seems very unlikely that the 
V genes themselves determine at the genetic level 
where hypermutation occurs. A more reasonable 
explanation would be that selection determines 
which mutations finally survive. Various efforts 
have addressed this question (see, for example, 
Dorner et al, 1998). 

Weighing all arguments, we decided to take the 
natural distribution as our starting point. The mod- 
ular approach permits any desired optimization 
strategy to be readily be carried out, once primary 
binders have been obtained, such as the introduc- 
tion of V L CDR1 and/or V H CDR2 cassettes into 
single binders, or even pools of binders, since the 
sequences share identical restriction endonuclease 
sites adjacent to the CDRs. It would be also easily 
possible, for example, to keep the CDR3s of the 
selected pool of primary binders constant and 
shuffle V H frameworks with randomized CDR2s. 
Alternatively, new sets of CDR3 libraries can be 
designed based on sequence motifs identified in 
the pool of primary binders. Furthermore, chain 
shuffling or even shuffling of elements such as 
CDRs or frameworks can now performed by 
restriction digest and religation. 

Since HuCAL is fully synthetic, it is always poss- 
ible to control the individual steps by analyzing 
the restriction pattern of individual clones or by 
sequencing, with artifacts being easily identified, 
whereas an immune repertoire cloned via PCR is 
more or less a black box. 

By these means, searching the sequence space of 
human antibodies will be much faster and more 
efficient than by using the conventional 
approaches. Finally, we expect that the careful 
analysis of selected sequences will contain a wealth 
of structural information that can flow into sub- 
sequent versions of the library. 
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Conclusions and perspective 

The HuCAL concept is based on covering the 
essential features of the human antibody repertoire 
with a minimal number of different sequences, 
which are designed to facilitate extensive manipu- 
lation with standard protein engineering tech- 
niques. The 49 combinations of master genes have 
been cloned as scFv genes in both orientations and 
as F ab genes. Other formats like Fv fragments stabil- 
ized for example by disulfide-bridges (Glockshuber 
et al, 1990; Brinkmann et al, 1995; Rodrigues et al, 
1995) or fragments without any disulfide bonds 
(Worn & PHickthun, 1998) useful for intrabody 
approaches (Cattaneo & Biocca, 1999; Worn et al, 
2000) are easily adaptable and can be analyzed on 
the level of the master genes before actual library 
generation. Libraries can be rapidly created by 
inserting pre-built CDR cassettes into each of the 49 
genes either separately or as mixed sequence pool, 
and the analysis of binding variants is facilitated by 
the fact that only small regions in the sequence are 
varied and that the three-dimensional models of all 
master frameworks have been built. It may there- 
fore be possible for the first time to investigate 
experimentally why nature has evolved the distinct 
structural motifs found in the human antibody 
repertoire, and whether there are correlations of 
antibody structure with antigen class, antibody affi- 
nity and specificity. Future versions of HuCAL may 
therefore be enriched with antigen-type specific fea- 
tures. 


Materials and Methods 

Bacterial strains, phages, vectors 

Molecular cloning was carried out using the E. coli 
strains JM83 (Yanisch-Perron et al., 1985), XLl-Blue (Stra- 
tagene) or ToplO (Invitrogen). For expression exper- 
iments, TM83 was used. Phage-display libraries were 
generated and propagated using E. coli TGI as host 
strain and M13K07 or VCSM13 as helper phage (all from 
Stratagene). The products from gene synthesis were 
cloned in pZero-1 (Invitrogen) or pCR-Script SK(+) (Stra- 
tagene) for sequencing. The pBS vector series used for 
antibody cloning and for expression analysis is a deriva- 
tive of the phage-display vector pAKlOO (Krebber et al, 
1997). The vector pBSlO contains the mature bla gene 
preceded by a region encoding the ompA signal 
sequence, a FLAG tag and an EcoRI cloning site between 
the Xbal/HindlR cloning sites of pAKlOO. The pBSlO 
vector was modified as follows in order to allow assem- 
bly of synthetic antibody genes. First, an oligonucleotide 
cassette encoding a synthetic phoA signal sequence (cre- 
ated by annealing the oligonucleotides 05phoA and 
03phoA; all oligonucleotides constructed during this 
work are given in Table 2 of the Supplementary 
Material) was inserted into the XM/EcoRI sites. The 
resulting construct was designated pBSll. This phoA 
gene fragment contained a unique Sapl site, which was 
later used for insertion of V H genes for the generation of 


f ftp:/ /ttwu.bme.nwu.edu/pub/database 
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F ab fragments. The phoA gene fragment was extended by 
inserting a cassette created by annealing the oligonucleo- 
tides 05phoA_F and 03phoA_F into pBSll via Sapl/ 
EcoRI, thereby introducing the short improved FLAG tag 
(DYKDE; Knappik & PHickthun, 1994). The resulting 
vector, designated pBS12, was later used for the assem- 
bly of scFv genes in the H-L orientation as well as for 
expression analysis. Second, the Xbal/EcoRl fragment 
from pBSlO was replaced by a cassette created by 
annealing the oligonucleotides 05stII and 03stII, thereby 
introducing a stll signal sequence containing a unique 
Nsil site, which was later used for cloning of V L genes 
for the generation of F ab fragments. The resulting vector 
was designated pBS13. The stll gene fragment was 
extended by inserting a cassette created by annealing the 
oligonucleotides 05stII_F and 03sffl_F into p>BS13 via 
Nsil/ EcoRI, thereby introducing the short improved 
FLAG tag. The resulting vector, designated pBS14, was 
later used for the assembly of scFv genes in the L-H 
orientation. Vector pBS13b was constructed by removing 
the MscI site in the cat resistance marker gene. The 
phage display vector pIG10.3 is a derivative of pIGlO 
(Ge et al., 1995), where the first 249 codons of the mature 
full-length gene III were deleted. Briefly, the EcoRI/ 
HmdIII restriction fragment in the phagemid pIGlO was 
replaced by the c-myc tag for detection with the mono- 
clonal antibody 9E10 (Munro & Pelham, 1986) followed 
by an amber codon and the truncated version of the 
gene III through PCR mutagenesis. The construction of 
the pMorph vector series, which is compatible with the 
HuCAL restriction sites and which was used for library 
cloning, will be described elsewhere (unpublished 
results). All vectors were constructed using site-directed 
mutagenesis (Kunkel, 1985), recursive PCR (Prodromou 
& Pearl, 1992) and overlap-extension PCR (Ge & 
Rudolph, 1997), and all constructs were subsequent!)' 
verified by DNA sequencing (SequiServe, Vaterstetten, 
Germany). 


Collection of human antibody sequences 

Functional human germline sequences were down- 
loaded from Genbank (Benson et al, 1997), from the 
Kabat databasef and from VbaseJ. Rearranged 
sequences were downloaded from Genbank and from 
the Kabat database. Kabat dump files were downloaded, 
variable domain amino acid sequences extracted and 
converted to the one-letter code. Sequences less than 90% 
complete or containing multiple undetermined residues 
in the regions of interest were eliminated. The automatic 
alignment generated by the program Pileup (Wisconsin 
Package, Version 8.1, 1995, Genetics Computer Group, 
Madison, WI, USA) was manually corrected to shift the 
gaps to the closest positions where they could be accom- 
modated in the three-dimensional structure. The 
sequence files were converted and imported into Micro- 
soft Excel®, where all subsequent alignments and anal- 
ysis procedures took place. All alignments, numbering 
and loop regions (L1-L3, H1-H3) are according to struc- 
tural criteria defined by Chothia and colleagues (see 
Chothia et al, 1992; Tomlinson et al, 1995; Williams et al, 
1996). CDRs were labeled as described by Kabat et al 
(1991), even though this does not always correspond to 
the structural definition. Amino acid sequences are given 
in the single letter code according to standard IUPAC 
nomenclature. Germline sequences are named according 
to accepted locus nomenclature for each segment 
(Giudicelli et al, 1997). 
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Statistical analysis of the coverage of 
sequence space 


After alignment and numbering according to Kabat, 
the databases were normalized by checking for multiple 
entries of closely related sequences, which we thought 
would indicate an artificial bias towards a specific set of 
rearranged sequences. Subsequently, the rearranged and 
the germline sequences were grouped into the various 
subfamilies. To assign the nearest germline to each 
rearranged sequence, the number of identities of a given 
rearranged sequence to each germline sequence was 
scored from position 1 to 92 (V H ) or position 1 to 95 (Vk) 
or 1 to 95B (VI). If the result was ambiguous, e.g. the 
rearranged sequence was equidistant from two or more 
germline sequences, or if the best hit gave less than 80 % 
identity, indicating either a very high level of somatic 
mutations or the origin from an at the time unknown 
germline gene, the rearranged sequence was omitted 
from the analysis. 

By this analysis, the subfamilies that are used fre- 
quently by the human immune system were identified. 
The databases of rearranged sequences were used to cal- 
culate a consensus sequence for each frequently used 
subfamily. This was done by counting the number of 
amino acid residues used at each position (position 
variability) and subsequently identifying the amino acid 
residue most frequently used at each position. The con- 
sensus sequences were cross-checked with the consensus 
of the germline families to see whether the rearranged 
sequences were biased at certain positions towards 
amino acid residues that do not occur in the collected 
germline sequences, but this was found not to be the 
case. Subsequently, the CDR1 and CDR2 regions of the 
consensus sequences were replaced with the correspond- 
ing regions of the germline sequences that were most fre- 
quently used by the human immune system. For the 
framework 4 region, the consensus of all rearranged 
sequences was chosen. For each of these consensus 
sequences, the most homologous rearranged sequences 
were then identified and used for validating the consen- 
sus by identifying all framework residues that differed 
between the consensus and the most homologous 
rearranged sequences. These residues were regarded as 
artificial and checked by two means: first, the local con- 
text of the artificial residue was compared with the cor- 
responding stretch of all the rearranged sequences in the 
database; and second, the long-range interactions of 
amino acid residues at these positions were analysed. To 
this end, the structures of human antibodies available 
from the Brookhaven Protein Database were analyzed, 
and the contacts of all side-chains were tabulated. If a 
certain artificial residue in the consensus sequence was 
found in the local context of rearranged sequences, and 
if this residue was not involved in side-chain interactions 
according to the structural analysis, it was kept at this 
position. Otherwise, the next most common residue was 
chosen and analyzed as described above. Finally, the 
consensus sequences were compared to the correspond- 
ing germline sequences and the number of differences 
were tabulated. 


t www.biochem.ucl.ac.uk/ ~ roman/procheck/ 
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Molecular modeling 

The structures of the HuCAL domains were predicted 
by homology modeling using the Homology, Biopoly- 
mer and Discover modules of the program Irtsightll ver- 
sion 95 (Biosym/MSI, San Diego, CA). To align different 
templates for the comparison of their conformation, a 
least-squares fit of the expositions of residues H3-H7, 
H19-H23, H34-H40 (gapped according to structural cri- 
teria, not according to Kabat), H44-H50, H67-H71, H78- 
H82, H88-H94 and H102-H108 (V H ) or L3-L7, L20-L24, 
L33-L39, L43-L49, L62-L66, L71-L75, L84-L90 and L97- 
L103 (V L ) was performed. The experimental structures 
displaying the highest degree of sequence similarity to 
the different HuCAL constructs are listed in Table 1 of 
the Supplementary Material. Structural differences 
between these templates were analyzed to identify the 
sequence differences responsible for the deviations. The 
conformation of the dummy CDR3 s was taken from the 
structure of the humanized 4D5 version 8 (PDB entry 
1FVC). Coordinates were assigned using the Homology 
module and the resulting models checked for steric 
clashes and cavities before energy minimization (module 
Discover, CFF91 forcefields). The stereochemical quality 
of the final domain models was evaluated with the pro- 
gram PROCHECKf (Laskowski et ah, 1993; Morris et al, 
1992). 


Gene synthesis and assembly 

Consensus amino acid sequences were back-trans- 
lated into DNA sequences using the GCG software 
package (Genetics Computer Group, Madison, WI, 
USA) and a Codon definition file that included only 
the codons that are used frequently in E. coli%. All 
possible silent (and commercially available) restriction 
sites based on version 501 of the REBASE list of 
restriction enzymes (Roberts & Macelis, 1999§) were 
subsequently identified in the resulting DNA 
sequences and tabulated. These tables were used to 
identify all cleavage sites that were located close to 
the CDR and framework borders, and that could be 
introduced into all genes of the three classes (V H , Vk 
or VX.) simultaneously at the same position. Further 
editing was done as described in Results. For each of 
the 14 resulting genes, six overlapping oligonucleo- 
tides were designed. Since both the CDR3 and the fra- 
mework 4 gene segments were identical in all Vk, VX 
and V H genes, respectively, this part was constructed 
only once in each case. The region of overlap was 
chosen to give a theoretical f m of 58°C (corresponding 
to a AG of about —20 kcal/mol), and the 3' nucleo- 
tide was chosen to be either C or G. The design was 
examined and optimized in terms of potential stem- 
loop formation, dinner formation and potential unspe- 
cific hybridization sites with all other oligonucleotides 
(duplex formation) using the VectorNTI® software 
(Informax, Inc.). PCR assembly (Prodromou & Pearl, 
1992) was performed by mixing 200 pmol of each of 
the oligonucleotides in a 100 ul reaction volume con- 
taining 20 nmol of dNTPs and five units of Pfu poly- 
merase (Stratagene). After a first cycle with three 
minutes at 94 °C, two minutes at 60 °C and one min- 
ute at 72 °C using a hotstart procedure, 31 PCR cycles 
were performed (one minute at 94° C, two minutes at 
60 °C and one minute at 72 °C), the products were 
purified using the QIAgen PCR purification kit and 
blunt-end ligated with either the pCR-Script KS(+) 
(cut with S/rl) or the pZero-1 vector (cut with EcoRV). 
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Insert containing clones were screened by blue-white 
selection (pCR-Script KS(+)) or directly picked (pZero- 
1) and sequenced. 

The seven synthesized V H genes covered the 
sequences from the first unique 5' restriction site located 
in the phoA signal sequence region (SapT) to the last 
unique 3' restriction site located in the framework 3 
region prior to the CDR3 (BssHII) . All genes were syn- 
thesized with their authentic N termini and without the 
short FLAG sequence, which was added later during the 
construction of scFv display vectors. The heavy chain 
C H 1 domain (subtype IgGl, Genbank accession number 
A49444) including the V H framework 4 region was 
assembled using eight oligonucleotides (OCH1 to OCH8) 
and inserted into pCR-Script KS(+). The C H 1 gene 
sequence was designed for optimal E. coli codon usage. 
Additionally, restriction sites for Sail and EcoRI were 
incorporated at the 5'- and 3'-ends, respectively, and 
most internal restriction sites were removed during the 
gene design. In a second step, the V H dummy CDR3 
region was inserted as a Pstl/Sh/l cassette using the oli- 
gonucleotides OHCDR3P and OHCDR3 M. The V H gene 
fragments (SnpI/BssHU) were assembled with the CDR3- 
framework 4-C H l sequence (BssHII/ EcoRI) by a three- 
fragment ligation with the vector pBSll (Sapl/EcoRl), 
yielding seven Fd fragments for construction of F ab 
expression vectors. The N-terminal FLAG tag was added 
later for scFv constructions by cloning the Fd fragments 
into the vector pBS12 using the restriction enzymes Mfel 
and EcoRI. 

The four synthesized V L kappa genes covered the 
sequences from the unique 5' restriction site located in 
the stll signal sequence region (Nsil) to the unique 3' 
restriction site located in the framework 3 region prior 
to the CDR3 (Eco57I). The human kappa constant 
domain Ck (Genbank accession number P01834) 
including the Vk framework 4 region, the V L dummy 
CDR3, and part of the Vk framework 3 region 
(including the Eco57I restriction site) was synthesized 
using eight oligonucleotides (OCLkl to OCLk8) and 
inserted into pCR-Script KS(+). The Ck gene sequence 
was optimized for E. coli codon usage, the internal 
restriction sites except AccI were removed and the 
restriction sites for BsiWl and Stul were incorporated 
at the 5' and 3'-end, respectively. The Vk gene frag- 
ments (NszI/Eco57I) were then assembled with the 
CDR3-framework 4-Ck sequence (SpW/Eco57I) by a 
three-fragment ligation with the vector pBS13 (Sphl/ 
Nsil), yielding four kappa light chain fragments for 
construction of F ab expression vectors. 

The three synthesized V L lambda genes covered the 
sequences from the unique 5' restriction site located in 
the stll signal sequence region (Nsil) to the unique 3' 
restriction site located in the framework 3 region prior 
to the CDR3 (BbsT), All genes were synthesized using 
their authentic N termini, i.e. without the aspartate- 
isoleucine stretch encoded by an EcoRV site used for 
the Vk genes. The human lambda constant domain 
CXI (Genbank accession number P01842) including the 
VX framework 4 region, the V L dummy CDR3, and 
part of the VX framework 3 region was assembled as 
Bbsl/Sphl fragment by complete gene synthesis with 
12 oligonucleotides. The VX gene fragments (Nsil/Bbsl) 
were assembled with the CDR3-framework 4-CX 
sequence (Bbsl/Sphl) by a three-fragment ligation with 
the vector pBS13b (Sphl/NstT), yielding three lambda 
light chain fragments for construction of F ab 
expression vectors. In order to assemble V L -V H scFv 
vectors, the VX gene fragments were further modified: 


the VX gene fragments were PCR amplified from 
pBS13b using the forward primers OLxFwlDIP (wer- 
e x denotes the VX sub-family) and the backward pri- 
mer OLFw4 M, and the PCR products were blunt-end 
ligated into pBS14_scKlH3 (a V L -V H scFv expression 
vector constructed as described below), which had 
been cut with EcoRV/ BszWI and made blunt-ended by 
teatment with S\ nuclease. The resulting three plas- 
mids were named pBS14_scX (1 . 3) H3 and contained VX 
genes where the two N-terminal codons had been 
changed to the EcoRV recognition sequence encoding 
aspartate-isoleucine, in order to allow the same scFv 
cloning protocol as used for the Vk genes. These plas- 
mids were used for assembly of the V L -V H scFv 
expression vectors (see below). In order to assemble 
V H -V L scFv vectors, a cassette constructed by anneal- 
ing the oligonucleotides OLEco5 and OLEco3 was 
inserted into the lambda light chain containing vectors 
pBS13b_VX (1 „ 3) CX cut with Mscl/EcdSI, thereby repla- 
cing the CL constant domain gene fragment by an in- 
frame EcoRI site. The resulting three plasmids were 
designated pBS13b_VX (1 _ 3) _E. After cutting these vec- 
tors with Xmal/Hindlll, the 3'-region of each the three 
VX genes including the in-frame EcoRI sequence were 
isolated and inserted into the corresponding 
pBS14_scX, 1 . 3) H3 vectors, thereby adding the 5' EcoRV 
and the 3' EcoRI sites to each of the three VX genes. 
These genes were used to assemble the V H -V L scFv 
expression vectors (see below). 

F ab expression plasmids were constructed by combin- 
ing each of the heavy chain Fd fragments cut with Sphl/ 
EcoRI and each of the light chain fragments cut with 
Xbal/Sphl with the pBS13 vector cut with Xbal/EcdBl in 
a three-fragment ligation reaction. The 49 resulting plas- 
mids were verified by restriction enzyme digestions. 
Here, the VX gene fragments contain their authentic N 
termini, and there is no FLAG tag sequence attached to 
the antibody F ab genes. 

The scFv expression plasmids in the orientation V L -V H 
were constructed as follows: the Ck gene fragment from 
PB13_Vk2Ck was removed by cutting the plasmid with 
BsiWl/Sphl and replaced by an oligonucleotide cassette 
encoding a 20 amino acid residue linker plus the 
additional restriction sites Mfel and EcoRI for later inser- 
tion of the V H genes. The cassette was constructed by 
annealing the oligonucleotides OLHLiP and OLHLiM. 
Subsequently, the remaining Vk and VX genes were 
inserted as Xbal/ BsiWl fragments and the V H genes 
were inserted as M/el/EcoRI fragments. 

The 49 scFv expression plasmids in the orientation 
V H -V L were constructed as follows: the CHI gene frag- 
ment from pBS12_VH3CHl was removed by cutting the 
plasmid with Blpl/ EcoRI and replaced by an oligonu- 
cleotide cassette encoding a 20 amino acid residue linker 
plus the additional restriction site EcoRV for later inser- 
tion of the V L genes. The cassette was constructed by 
annealing the oligonucleotides OHLLiP and OHLLiM. 
Subsequently the Vk and VX genes were inserted as 
EcoRV/ EcoRI fragments and the V H genes were inserted 
as Xbal/Blpl fragments. These 49 vectors were used for 
expression analysis, and the scFv genes were later used 
for library construction. 


Expression analysis 

Growth curves and expression data were obtained 
essentially as described (Knappik & Pluckthun, 1995). 
Briefly, E. coli JM83 cultures containing the appropriate 
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scFv expression vectors were grown at 30 °C and 
induced with 1 mM IPTG. After two hours of expression, 
cells were harvested, normalized to identical absorbance, 
lysed and separated into soluble and insoluble cell frac- 
tions by centrifugation. The fractions were assayed by 
reducing SDS-PAGE, blotting and immunostaining using 
the anti-FLAG antibody Ml (Sigma), and the amount of 
soluble and insoluble scFv protein produced was quanti- 
fied densitometrically. The scFv gene H3k2 was used as 
internal control in each expression experiment. 

Expression kinetics were measured as follows: 
E. coli JM83 cells Were transformed with the scFv 
genes cloned in the expression vector pMorphx7_FS 
(unpublished results) and grown in 1 1 shaking-flask 
cultures at 30 °C. After induction with 1 mM IPTG, 
50 ml of culture was harvested each hour, the cells 
were normalized to A = 50, lysed by sonification, and 
the crude extracts were stored at -20 °C. After ten 
hours induction, the remaining culture (500 ml) was 
harvested, lysed, the scFv fragment was purified using 
Poros Streptactin affinity chromatography (IBA, Gottin- 
gen, Germany), and the amount of purified scFv was 
determined. The functional scFv expression yield at 
the different time-points was then determined by 
ELBA measurements, where the purified antibody 
fragment of known concentration served as internal 
standard used to calculate the scFv amount based on 
the ELISA signal obtained. 


CDR analysis and library design 

The aligned collections of rearranged human antibody 
V H and V L sequences were used for analysis of CDR3 
length and composition. For analysis of V H CDR3 s, all 
sequences were grouped together because sequence 
iligm in-, in not » 10k in this highly diverse region. 
For Vk and V^. CDR3s, the subfamilies were analyzed 
separately. Within the individual alignments, the CDRs 
were grouped according to CDR length. Assignment of 
the individual groups to canonical structures was done 
according to the rules described by Chothia et at. (1989). 
All analysis was done using Microsoft Excel®. 


Synthesis of trinucleotide-containing 
oligonucleotides 

Synthesis of O-methyl trinucleotide phosphoramidites 
and their application in automated DNA synthesis has 
been described (Virnekas et ah, 1994). Trinucleotide mix- 
tures were prepared by mixing appropriate stoichio- 
metric amounts of solid phosphoramidites, assuming 
equal reactivities of all 20 trinucleotides. The mixtures 
were dried under argon and dissolved to yield 0.1 M sol- 
utions as described (Virnekas et at., 1994). Automated 
synthesis was performed on an Applied Biosystems 
DNA synthesizer 380B. The synthesis reagents were 
obtained from Applied Biosystems and MWG (Ebers- 
berg, Germany). All trinucleotide-based syntheses were 
performed on columns with polystyrene support, 
1000 A, 40nmol (Applied Biosystems, art. 401072 to 
401075). For synthesizing stretches with mononucleotide 
building blocks of the oligonucleotides, conventional 
mononucleotide O-cyanoethyl phosphoramidites, and 
the standard synthesis cycle SSCEAF (single coupling, 15 
seconds wait step) were used. When coupling trinucleo- 
tide mixtures stoichiometrically, the standard cycle was 
changed to double couple, including a 100 seconds wait 
step after the first, and a 400 seconds wait step after the 


second coupling. For sub-stoichiometric couplings, the 
time for delivering activated phosphoramidite solution 
to the column was reduced to achieve approximately 
50% coupling yield. If substoichiometric coupling rates 
were much higher or lower than 50%, either the time 
was adjusted for the subsequent couplings to obtain an 
average yield of 50% over all substoichiometric coup- 
lings or an additional substoichiometric coupling step 
was added. Deprotection of the oligonucleotides was 
performed as described (Virnekas et ah, 1994). All trinu- 
cleotide-containing oligoncleotides synthesized for CDR3 
library generation are given in Table 3 of the Supplemen- 
tary Material. 


Cassette preparation 

The oligonucleotides were resuspended in TE buffer 
and purified with an S200 column (Pharmacia) accord- 
ing to the supplier's manual. The complementary 
strand was synthesized with Klenow polymerase 
(New England Biolabs). Approximately 5 nmol of oli- 
gonucleotide was mixed with a cassette-specific corre- 
sponding primer at a ratio of 1:1.2, respectively, 
heated for ten minutes to 80 °C followed by slowly 
cooling to room temperature: 10 ul of a 10 mM dNTP 
mixture, 15 ul of Klenow buffer, 2 ul of Klenow poly- 
merase and water to 150 ul final volume were added. 
The fill-in reaction was performed at 37 °C for two 
hours and purified with a Nick Spin column accord- 
ing to the supplier's manual (Pharmacia Biotech). The 
fill-in reaction was checked by an analytical FMC 
agarose gel (Biomol). To amplify the fill-in products, 
PCR reactions were performed using 1 ul of the fill-in 
reaction mixtures (approximately 25 pmol) and 
100 pmol of each primer (fill-in primer plus second 
cassette-specific primer) in each case (30 cycles, one 
minute at 94 °C; one minute at 54 °C, one minute at 
72 °C). The PCR mixtures were purified with a Nick 
Spin column. The oligonucleotide library cassettes 
were prepared for ligation by adding 30 ul of buffer 
to 100 ul of the purified PCR product, 150 units of 
each of the corresponding restriction enzymes, and 
water to a final volume of 300 ul, and by digesting 
overnight at 37 °C. The cassettes were purified on 4% 
FMC agarose gels (Biomol), and recovered from the 
gel via BIOTRAP elution (Schleicher & Schuell, 
Germany) according to the supplier's manual (approxi- 
mately two hours at 100 V/50-70 mA). The solutions 
containing the cassettes were desalted with Nick spin 
columns. The quality of the cassettes was checked by 
analytical FMC agarose gels (4%). 


Generation of the HuCAL.1 library 

Template V H vectors were created by inserting the 
seven HuCAL V H master genes as Fd genes from the 
vector pBS13 into the display vector pMorph7 (unpub- 
lished results). The V H CDR3 sequences were then 
replaced by a 1220 bp dummy fragment containing the 
[3-lactamase gene, thereby facilitating subsequent steps 
for vector fragment preparation. The template V H vectors 
were cut with Styl/Hindlll to remove the CHI gene 
fragment, and the vector fragments were purified. At 
this step, the two template vector fragments encoding 
the V H 1A and V H 1B master genes were mixed in an 
equimolar ratio, resulting in six V H vector templates. 

Template V L vectors were constructed by first 
inserting the HuCAL scFv master genes containing the 
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HuCAL H3 gene in combination with all seven V L 
genes into the vector pMorph4, and second, replacing 
the V L CDR3 sequences by the (3-lactamase gene 
dummy sequence as described above. The resulting 
seven template V L vectors were then purified, the 
1220 bp dummy fragment was removed by cutting 
with Bbsl/Mscl for Vk gene-containing vectors and 
Bbsl/Hpdl for the VX gene-containing vectors. The pre- 
pared trinucleotide cassettes encoding the V L CDR3 
libraries were then ligated separately with the seven 
V L template vector fragments (25 fmol of each vector 
was ligated with 250 fmol of each cassette for three 
hours at room temperature), and the ligation mixtures 
were electroporated in 0.9 ml of E. coli TGI cells, 
yielding altogether 1.14 x 10 7 independent colonies. 
The colonies were scraped off the selection plates, and 
the V L CDR3 libraries were stored in 20% (w/v) gly- 
cerol at -80 °C. Phagemid DNA of the four Vk and 
the three VX libraries was prepared and two pools 
were created by mixing the four Vk (K mix ) and the 
three VX (X mix ) libraries in an equimolar ratio. The 
two DNA pools were treated with Styl/HindHl, the 
V L gene libraries were purified using agarose gel elec- 
trophoresis and 75 fmol of each pool was ligated with 
25 fmol of each of the six V H template vectors (see 
above), and electroporated in 0.3 ml of E. coli TGI 
cells, resulting in altogether 2.3 x 10 7 colonies for the 
12 library pools. In the final step, these 12 libraries 
were prepared as DNA, cut with BssHII/Siyl to 
remove the (3-lactamase dummy gene inside the V H 
CDR3 region, and the two V H CDR3 trinucleotide 
library cassettes (HCDR3a and HCDR3b) were 
inserted separately by ligation using the same con- 
ditions as above. After electroporation into 7.2 ml of 
E. coli TGI (12 electroporations for each library), we 
obtained altogether 2.1 x 10 9 independent colonies. 
The diversity for each pool was between 0.6 x 10 8 
(V H 2X mix ) and 3.9 x 10 s (V H 6K mlx ). The colonies were 
scraped off the selection plates, and the 24 HuCALl 
library were stored as aliquots in 20% glycerol at 
-80 °C. 


Data Bank accession numbers 

The coordinates of the 14 framework models have 
been deposited in the RCSB Protein Data Bank, entries 
1DGX (VkI), 1DH4 (Vk2), 1DH5 (Vk3), 1DH6 (Vk4), 
1DH7 (VX1), 1DH8 (VX2), 1DH9 (VX3), 1DHA (VH1A), 
1DHO (VH1B), 1DHQ (VH2), 1DHU (VH3), 1DHV 
(VH4), 1DHW (VH5) and 1DHZ (VH6). 
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